ANLP: Schedule and Course Materials | Open Course Materials

Below is the planned schedule and list of topics for 2025-26, but details may change.

Links for Weeks1-11 will start working on the Sunday morning at the start of each week.

Week <1: Preparation steps to take. Available now! Please visit this page to help you prepare for the course.
Week 1: Language as data: structure and statistics. Including: levels of structure and ambiguity, corpora, Zipf's law, morphology across languages, BPE tokenization.
Week 2: N-gram models. Including: models and parameters, n-gram language models, training and evaluating, smoothing, and sampling.
Week 3: Classification and lexical semantics Including: examples of text classification, multinomial logistic regression models and training, word senses and relations, WordNet, distributional semantics.
Week 4: Word embeddings and neural networks Including: Dense word embeddings (word2vec), semantic similarity measures and evaluation, linear separability, multi-layer perceptron model and training.
Week 5: Algorithmic bias, language variation, and RNNs. Including: Algorithmic bias, protected characteristics, dialects and varieties, linguistic discrimination; recurrent neural network language models, long-distance dependencies.
Week 6: Attention and Transformers Including: Sequence-to-sequence RNNs, attention, self-attention and Transformer blocks, parallelization, positional embeddings.
Week 7: Advanced tokenization and LLMs Including: LLM pretraining and fine-tuning, self-supervised objectives, revisiting tokenization and sampling, evaluation datasets and metrics for LLMs.
Week 8: Masked language models and prompting Including: BERT, Sentence-BERT, in-context learning, chain-of-thought reasoning.
Week 9: Scaling laws and instruction tuning
Week 10: Alignment Including: Reinforcement learning with human feedback (RLHF), other topics TBD.
Weeks 11+: Exam revision No new material.

License

Search

Navigation