ANLP: Schedule and Course Materials
Below is the planned schedule and list of topics for 2025-26, but details may change.
Links for Weeks1-11 will start working on the Sunday morning at the start of each week.
- Week <1: Preparation steps to take. Available now! Please visit this page to help you prepare for the course.
- Week 1: Language as data: structure and statistics. Including: levels of structure and ambiguity, corpora, Zipf's law, morphology across languages, BPE tokenization.
- Week 2: N-gram models. Including: models and parameters, n-gram language models, training and evaluating, smoothing, and sampling.
- Week 3: Classification and lexical semantics Including: examples of text classification, multinomial logistic regression models and training, word senses and relations, WordNet, distributional semantics.
- Week 4: Word embeddings and neural networks Including: Dense word embeddings (word2vec), semantic similarity measures and evaluation, linear separability, multi-layer perceptron model and training.
- Week 5: Algorithmic bias, language variation, and RNNs. Including: Algorithmic bias, protected characteristics, dialects and varieties, linguistic discrimination; recurrent neural network language models, long-distance dependencies.
- Week 6: Attention and Transformers Including: Sequence-to-sequence RNNs, attention, self-attention and Transformer blocks, parallelization, positional embeddings.
- Week 7: Advanced tokenization and LLMs Including: LLM pretraining and fine-tuning, self-supervised objectives, revisiting tokenization and sampling, evaluation datasets and metrics for LLMs.
- Week 8: Masked language models and prompting Including: BERT, Sentence-BERT, in-context learning, chain-of-thought reasoning.
- Week 9: Scaling laws and instruction tuning
- Week 10: Alignment Including: Reinforcement learning with human feedback (RLHF), other topics TBD.
- Weeks 11+: Exam revision No new material.
License
All rights reserved The University of Edinburgh