Probabilistic Topic Models
About
The special course studies probabilistic topic modeling of collections of text documents. The topic model determines which topics are contained in a large text collection, and which topics each document belongs to. Topic models allow you to search for texts by meaning, not by keywords, and create a new generation of information search engines based on the paradigm of semantic exploratory search (exploratory search). Topic models for classification, categorization, segmentation, summarization of natural language texts, as well as for recommendation systems, analysis of bank transactional data, analysis of biomedical signals are considered. The special course develops a multi-criteria approach to the construction of models with specified properties - additive regularization of topic models (ARTM). It is based on the regularization of incorrectly set stochastic matrix decomposition problems. Particular attention is paid to the methods of linguistic regularization for modeling the coherence of the text. Students are supposed to conduct numerical experiments on model and real data using the BigARTM topic modeling library.
Syllabus
- The task of topic modeling.
- Online EM algorithm and regularizers.
- Exploratory information search.
- Assessment of the quality of topic models.
- BigARTM and basic tools.
- EM algorithm theory.
- Bayesian training of the LDA model.
- Topic models of word compatibility.
- Dependency analysis.
- Multimodal topic models.
- Modeling the local context.
- Summarization and visualization.
Labworks
No.
Grading
The condition for passing the course is the performance of individual practical tasks.
Prerequisites
Linear algebra, mathematical analysis, probability theory.