For the subject we will be using the python programming language, almost exclusively. You will want to install python on your own machine (see below for details.)
Many of the lectures as well as the assignments will have associated iPython notebook files. iPython notebooks are a nice way to your organize Python code into easily digestible, independantly runable bits, with output (both text and visualisations) and comments appearing in separate blocks between code blocks. The official recommended way to load iPython notebooks (as well as regular Python code) in the context of this class is Canopy. Canopy is a special python environment which has iPython notebook support built in. The Engineering lab computers have Canopy installed, including all the python packages and other software needed for this class. If you want to use Canopy on your own machine, you can get a free academic version here. You will need to sign up and request an academic license. After installation, the key packages needed for this class (NLTK, Sci-kit learn) can be installed through the Canopy package manager. In order to load a iPython notebook in Canopy, select the editor from the main menu, and then click on File->Open. After you select the file, it will open in your web browser. Later in the class we may also make use of the Stanford NLP tools (including part-of-speech tagger and parser), which are on the lab computers and can be accessed via NLTK, see instructions here
If you want to run iPython notebooks with a regular Python installation, you will need to install ipython, and the ipython notebook software to run this. Please see the ipython notebook site for installation instructions. Once this is installed you simply run ipython notebook from a folder containing the downloaded .ipynb files and it should find them and allow you to view the notebooks. Note that if you do install this yourself, you will need to install many of the libraries yourself (these come bundled with Canopy.) For example, you will need numpy, scipy and matplotlib, as well as the libraries mentioned above (nltk, scikit_learn) which you can install using ubuntu packages, 'pip' or the like. Be warned, installing the mathematical libraries can sometimes prove difficult, due to their many depenencies on linear algebra development libraries, fortran and the like. Canopy is a much easier solution, unless you're up for a challenge.
The version of Canopy officially supported for this class uses Python 3.5, and so even if you are not using Canopy we recommend you use Python 3.5.
Date | Topic | Notebook |
Tue 5/3 | Introduction and Preprocessing |
WSTA_N1_preprocessing.ipynb
|
Wed 6/3 | Information Retrieval with the vector space model |
WSTA_N2_information_retrieval.ipynb
|
Tue 26/3 | Text classification |
WSTA_N7_text_classification.ipynb
|
Wed 27/3 | Ngram language modelling (error corrected in slides 28/3, again) |
WSTA_N8_n-gram_language_models.ipynb
|
Tue 2/4 | Lexical semantics |
WSTA_N9_lexical_semantics.ipynb
|
Wed 3/4 | Distributional semantics |
WSTA_N10_distributional_semantics.ipynb
|
Tue 9/4 | Part of Speech Tagging |
WSTA_N11_part_of_speech_tagging.ipynb
|
Tue 30/4 | Probabilistic Sequence Modelling |
WSTA_N15_hidden_markov_models.ipynb
|
Tue 7/5 | Context-Free Grammars |
WSTA_N17_context-free_grammars.ipynb
|
Wed 8/5 | Probabilistic Parsing (slides and notebook updated to correct mistakes, 8/5/19) |
WSTA_N18_probabilistic_parsing.ipynb
|
Tue 21/5 | Machine Translation, word based models |
WSTA_N21_machine_translation.ipynb
|