For the subject we will be using the python programming language, almost exclusively. You will want to install python on your own machine (see below for details.)

Many of the lectures as well as the assignments will have associated iPython notebook files. iPython notebooks are a nice way to your organize Python code into easily digestible, independantly runable bits, with output (both text and visualisations) and comments appearing in separate blocks between code blocks. The official recommended way to load iPython notebooks (as well as regular Python code) in the context of this class is Canopy. Canopy is a special python environment which has iPython notebook support built in. The Engineering lab computers have Canopy installed, including all the python packages and other software needed for this class. If you want to use Canopy on your own machine, you can get a free academic version here. You will need to sign up and request an academic license. After installation, the key packages needed for this class (NLTK, Sci-kit learn) can be installed through the Canopy package manager. In order to load a iPython notebook in Canopy, select the editor from the main menu, and then click on File->Open. After you select the file, it will open in your web browser. Later in the class we may also make use of the Stanford NLP tools (including part-of-speech tagger and parser), which are on the lab computers and can be accessed via NLTK, see instructions here

If you want to run iPython notebooks with a regular Python installation, you will need to install ipython, and the ipython notebook software to run this. Please see the ipython notebook site for installation instructions. Once this is installed you simply run ipython notebook from a folder containing the downloaded .ipynb files and it should find them and allow you to view the notebooks. Note that if you do install this yourself, you will need to install many of the libraries yourself (these come bundled with Canopy.) For example, you will need numpy, scipy and matplotlib, as well as the libraries mentioned above (nltk, scikit_learn) which you can install using ubuntu packages, 'pip' or the like. Be warned, installing the mathematical libraries can sometimes prove difficult, due to their many depenencies on linear algebra development libraries, fortran and the like. Canopy is a much easier solution, unless you're up for a challenge.

The version of Canopy officially supported for this class uses Python 3.5, and so even if you are not using Canopy we recommend you use Python 3.5.

iPython Notebooks

Date Topic Notebook
Tue 5/3 Introduction and Preprocessing WSTA_N1_preprocessing.ipynb
Wed 6/3 Information Retrieval with the vector space model WSTA_N2_information_retrieval.ipynb
Tue 26/3 Text classification WSTA_N7_text_classification.ipynb
Wed 27/3 Ngram language modelling (error corrected in slides 28/3, again) WSTA_N8_n-gram_language_models.ipynb
Tue 2/4 Lexical semantics WSTA_N9_lexical_semantics.ipynb
Wed 3/4 Distributional semantics WSTA_N10_distributional_semantics.ipynb
Tue 9/4 Part of Speech Tagging WSTA_N11_part_of_speech_tagging.ipynb
Tue 30/4 Probabilistic Sequence Modelling WSTA_N15_hidden_markov_models.ipynb
Tue 7/5 Context-Free Grammars WSTA_N17_context-free_grammars.ipynb
Wed 8/5 Probabilistic Parsing (slides and notebook updated to correct mistakes, 8/5/19) WSTA_N18_probabilistic_parsing.ipynb
Tue 21/5 Machine Translation, word based models WSTA_N21_machine_translation.ipynb