Natural Language Processing

About

This course introduces students to both foundational and modern techniques in Natural Language Processing (NLP). Covering everything from classic text processing and statistical language models to the latest in deep learning, the course aims to equip students with practical skills and theoretical insights for building and analyzing state-of-the-art NLP systems.

Syllabus

NLP Basics: tokenization, text preprocessing, text representations
Text & Language Models: embeddings, n-gram models, RNNs, LSTMs, seq2seq, attention
Transformers & LLMs: Transformer, pre-training (MLM/CLM), prompting, fine-tuning, PEFT
Scaling & Optimization: distributed training, MoE, KV-cache, Flash Attention, efficient inference, quantization
Retrieval & Agents: Information Retrieval, RAG, agent-based systems
Post-training: alignment, RLHF, DPO

Coursework

The course includes three practical assignments and an exam. Each assignment must be completed within two weeks. Each assignment is worth up to 10 points. A penalty of 1 point will be applied for each day of delay. Both the oral exam and the homework assignment are mandatory parts of the course; both must be completed in order to successfully complete the course.

Grading

Final grade = 0.3 × (oral answer grade) + 0.7 × (average score for practical assignments)

Prerequisites

Probability Theory + Statistics
Machine Learning
Python
Basic knowledge on NLP