Pontifications

See Part 2A Get Firefox 55 first 3weeks’ words in preparation for NLTK -ngram-analysis

  • 1. Here’s how I invoked the code:
python/print-ngram-from-file.py ff55.08August2017.28august2017.1st.3weeks.title.content.txt
#!/usr/local/bin/python3
import nltk, re, pprint
from nltk import word_tokenize
from nltk import bigrams, trigrams
import fileinput
raw = ""
for line in fileinput.input():
    raw += line.rstrip()
    raw += ' '
tokens = word_tokenize(raw)
tokens_bigrams = bigrams(tokens)
print (list(tokens_bigrams))

Leave a comment on github