Natural Language Processing
About
This course introduces students to both foundational and modern techniques in Natural Language Processing (NLP). Covering everything from classic text processing and statistical language models to the latest in deep learning, the course aims to equip students with practical skills and theoretical insights for building and analyzing state-of-the-art NLP systems.
Syllabus
- NLP Basics: tokenization, text preprocessing, text representations
- Text & Language Models: embeddings, n-gram models, RNNs, LSTMs, seq2seq, attention
- Transformers & LLMs: Transformer, pre-training (MLM/CLM), prompting, fine-tuning, PEFT
- Scaling & Optimization: distributed training, MoE, KV-cache, Flash Attention, efficient inference, quantization
- Retrieval & Agents: Information Retrieval, RAG, agent-based systems
- Post-training: alignment, RLHF, DPO
Coursework
The course includes three practical assignments and an exam. Each assignment must be completed within two weeks. Each assignment is worth up to 10 points. A penalty of 1 point will be applied for each day of delay. Both the oral exam and the homework assignment are mandatory parts of the course; both must be completed in order to successfully complete the course.
Grading
Final grade = 0.3 × (oral answer grade) + 0.7 × (average score for practical assignments)
Prerequisites
- Probability Theory + Statistics
- Machine Learning
- Python
- Basic knowledge on NLP