Networks and Text Analysis
About
The methods and technologies used in data mining and based on the concepts of similarity, proximity, analogy are considered. The idea of similarity is inherent in human thinking, it has given rise to a whole range of approaches for all fundamental tasks of IAD, among which the main focus of the course is on classification, regression recovery, clustering, recovery of missing data. The theoretical basis for the construction, implementation and analysis of a wide range of models and methods of IAD is presented. Methods of constructing and calculating similarity functions, matching similarities on various sets of objects, synthesis of new ways of comparing objects based on existing ones are considered. A complex of technologies designed for efficient representation and processing of metric information by computing systems is considered. Heuristic data models describing the initial information about recognition objects based on various implementations of the concept of similarity are investigated. The problems requiring solutions in the implementation of these models are considered. Special data structures and algorithms are being studied to effectively configure and use the studied models.
Syllabus
- The main approaches to the task of similarity.
- Classical definition of metric and metric space.
- Local metrics and their extension to the entire space.
- Geometric subsets of general metric spaces.
- Examples of metric spaces.
- Classification of similarity functions.
- Characteristics of metrics.
- Metric transformations.
- Implementation of metrics.
- The principle of self-organization.
- Metrics on finite sets.
- Decomposition of MC by finite systems of MC.
Labworks
No.
Grading
Exam based on course materials.
Prerequisites
Machine learning, algorithms and data structures.