{ "cells": [ { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "# Preprocessing with NLTK" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "First, if you haven't used iPython notebooks before, in order to run the code on this workbook, you can use the run commands in the Cell menu, or do shift-enter when an individual code cell is selected. Generally, you will have to run the cells in order for them to work properly. The output for a given cell (in any) will appear below the code after it has completed running. To make sure things are working, run the cell bellow:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "hello world\n" ] } ], "source": [ "print(\"hello world\")" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Okay, now let's do some simple preprocessing on this snippet from the html from the class website:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true, "deletable": true, "editable": true }, "outputs": [], "source": [ "text = '''\n", "
\n", " \n", " \n", " \n", " \n", "\n", "