The aim for this subject is for students to develop an understanding of the main algorithms used in natural language processing and text retrieval, for use in a diverse range of applications including text classification, information retrieval, machine translation, and question answering. Topics to be covered include vector space models, part-of-speech tagging, n-gram language modelling, syntactic parsing and neural sequence models. The programming language used is Python, see the detailed configuration instructions for more information on its use in the workshops, assignments and installation at home.

Class hours


Tue 3:15-4:15pm
Law Building-GM15 (David P. Derham Theatre)
Wed 1:00-2:00pm
Law Building-GM15 (David P. Derham Theatre)

Office hour

Tue 11am-noon
Doug McDonell-9.02

Workshops from week 2 onwards. You will be assigned to one of the following timeslots

Monday Tuesday Wednesday Thursday Friday
9-10am Doug McDonell-502 4:15-5:15pm 221 Bouverie St-B113 11-12pm Elec. Engineering-121 2:15-3:15pm Alice Hoy-211 10-11am 221 Bouverie St-B117
11-12pm Elec. Engineering-121 5:15-6:15pm 221 Bouverie St-B132 3:15-4:15pm Alice Hoy-211 3:15-4:15pm Alice Hoy-210
1:15-2:15pm 221 Bouverie St-B113 6:15-7:15pm Old Engineering-EDS4 5:15-6:15pm 221 Bouverie St-B132
5:15-6:15pm 221 Bouverie St-B132
5:15-6:15pm 221 Bouverie St-B116
6:15-7:15pm Old Engineering-EDS4

Please see the workshop page for each week's worksheet.

The instructor for the subject is A/Prof. Trevor Cohn. The senior tutor for the subject is Winn Chow and the head tutor for the subject is Ekaterina Vylomova. The tutors for the subject are Ekaterina Vylomova, Winn Chow, Navnita Nandakumar, Nitika Mathur, Xudong Han, Zenan Zhai, Shivashankar Subramanian and Andrei Shcherbakov.


If you have questions, please post your questions to the discussion forum in the LMS. Given the size of the subject, it is not practical to respond to individual emails, and besides any question you have is likely to be relevant to other students. Otherwise, please talk to your tutor, or direct your queries to Winn,, or Ekaterina,, the senior/head tutors.


There are several textbooks used for the class:
Jurafsky, Daniel S.; Martin, James H.; Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, Third Edition (incomplete draft).
Eisenstein, Jacob; Natural Language Processing, Draft textbook 15/10/18.
Manning, Christopher D; Raghavan, Prabhakar; Schütze, Hinrich; Introduction to information retrieval, Cambridge University Press 2008.
Koehn, Philipp; Statistical Machine translation, Cambridge University Press 2009.
Much of the reading will be from JM3, although most NLP topics are also covered well by E18, so feel free to read both. All of the above are either free, or available from the university library as an ebook using your student login (K09). The texts will be linked below in the Materials column. We will be also using the NLTK software tools extensively in this class, so we also recommend:
Steven Bird, Ewan Klein, Edward Loper; Natural Language Processing with Python, O'Reilly, 2009. Updated draft version

Please see the reading page for each week's reading.


There will be 3 short homework assignments released roughly every three weeks (starting Week 2), and a final project due near the end of the semester.


We'll put the lecture slides up here as we cover the material, as well as pointers to the required reading.

Date Topic Materials
Tue 5/3 Introduction and Preprocessing Slides: WSTA_L1_introduction.pdf
Reading: JM3 Ch. 2
Notebook: WSTA_N1_preprocessing.ipynb
Wed 6/3 Information Retrieval with the vector space model Slides: WSTA_L2_ir_vsm.pdf
Reading: IIR Chapter 6
Notebook: WSTA_N2_information_retrieval.ipynb
Thu 7/3 Workshop on python basics (optional) Worksheet: week1-python-01.pdf
from Mon 11/3 Workshop on preprocessing and information retrieval Worksheet: workshop-02.pdf
Tue 12/3 Index compression and efficient query processing Slides: WSTA_L3_IR.pdf
Reading: IIR Chapter 5
Wed 13/3 Query completion and query expansion Slides: WSTA_L4_IR.pdf
Reading: IIR Chapter 9
from Mon 18/3 Workshop on index compression and efficient query processing Worksheet: workshop-03.pdf
Tue 19/3 Index construction and advanced queries Slides: WSTA_L5_IR.pdf
Reading: IIR Chapter 4
  Liu, Learning to Rank for Information Retrieval , section 1.3: Learning to Rank
Wed 20/3 IR Evaluation and Learning to Rank Slides: WSTA_L6_IR_evaluation.pdf
Reading: IIR Chapter 8
from Mon 25/3 Workshop on index construction and IR evaluation Worksheet: workshop-04.pdf
Tue 26/3 Text classification Slides: WSTA_L7_text_classification.pdf
Reading: JM3 Ch. 4
  JM3 Ch. 5
  alternatively E18 2.1, 4-4.1, 4.3-4.4.1
Notebook: WSTA_N7_text_classification.ipynb
Wed 27/3 Ngram language modelling (error corrected in slides 28/3, again) Slides: WSTA_L8_n-gram_language_models.pdf
Reading: E18 Ch 6 (skipping 6.3)
Notebook: WSTA_N8_n-gram_language_models.ipynb
from Mon 1/4 Workshop on text classification and ngram language modelling Worksheet: workshop-05.pdf
Tue 2/4 Lexical semantics Slides: WSTA_L9_lexical_semantics.pdf
Reading: JM3 C.1-C.3
Notebook: WSTA_N9_lexical_semantics.ipynb
Wed 3/4 Distributional semantics Slides: WSTA_L10_distributional_semantics.pdf
Reading: E18 14-14.6 (skipping 14.4)
  or JM3 Ch. 15
Notebook: WSTA_N10_distributional_semantics.ipynb
from Mon 8/4 Workshop on word semantics Worksheet: workshop-06.pdf
Tue 9/4 Part of Speech Tagging Slides: WSTA_L11_part_of_speech_tagging.pdf
Reading: JM3 8.1-8.3, 8.5.1
Notebook: WSTA_N11_part_of_speech_tagging.ipynb
Wed 10/4 Deep learning for language models and tagging Slides: WSTA_L12_neural_sequence_models.pdf
Reading: E18 6.3 (skip 6.3.1), 7.6
from Mon 15/4 Workshop on POS and deep learning for language models (schedule altered for Good Friday holiday) Worksheet: workshop-07.pdf
Tue 16/4 Information Extraction Slides: WSTA_L13_information_extraction.pdf
Reading: JM3 Ch. 17 - 17.2
Wed 17/4 Question Answering Slides: WSTA_L14_question_answering.pdf
Reading: JM3 Ch. 23 (skip 23.1.7, 23.2.3, 23.3)
  E18 17.5.2 (skipping methods)
Mon 22/4 Easter break
from Mon 29/4 Workshop on information extraction and question answering Worksheet: workshop-08.pdf
Tue 30/4 Probabilistic Sequence Modelling Slides: WSTA_L15_probabilistic_sequence_models.pdf
Reading: JM3 Ch. A.1, A.2, A.4
Notebook: WSTA_N15_hidden_markov_models.ipynb
Wed 1/5 Language theory and automata Slides: WSTA_L16_finite_state_automata.pdf
Reading: E18 Chapter 9.1 (skip starred parts)
from Mon 6/5 Workshop on HMM and FSA Worksheet: workshop-09.pdf
Tue 7/5 Context-Free Grammars Slides: WSTA_L17_context-free_grammars.pdf
Reading: JM3 Ch. 10.1-10.5
  JM3 Ch. 11-11.2
Notebook: WSTA_N17_context-free_grammars.ipynb
Wed 8/5 Probabilistic Parsing (slides and notebook updated to correct mistakes, 8/5/19) Slides: WSTA_L18_probabilistic_grammars.pdf
Reading: JM3 Ch. 12-12.6
Notebook: WSTA_N18_probabilistic_parsing.ipynb
from Mon 13/5 Workshop on CFG and probabilistic parsing Worksheet: workshop-10.pdf
Tue 14/5 Dependency parsing Slides: WSTA_L19_dependency.pdf
Reading: JM3 Ch. 13
Wed 15/5 Discourse Slides: WSTA_L20_discourse.pdf
Reading: JM2 Ch. 21 (21.1-21.3, 21.5-21.6)
from Mon 20/5 Workshop on dependency parsing and discourse Worksheet: workshop-11.pdf
Tue 21/5 Machine Translation, word based models Slides: WSTA_L21_machine_translation_word.pdf
Reading: JM2 Chapter 25, intro, 25.3-25.6
Notebook: WSTA_N21_machine_translation.ipynb
Wed 22/5 Machine translation, phrase based translation and neural encoder-decoder Slides: WSTA_L22_machine_translation_phrase.pdf
Reading: JM2 Chapter 25, 25.7-25.9
  E18 18.3–18.3.2
from Mon 27/5 Workshop on machine translation Worksheet: workshop-12.pdf
Tue 28/5 Memory-enhanced models for Discourse Understanding (Fei Liu) Slides: WSTA_L23_memory.pdf
Wed 29/5 Subject review Slides: WSTA_L24_review.pdf
Reading: exams from previous years on the library website (2017 most relevant)
  see LMS for solutions to 2017 exam