Stata/Natural Language Processing

Reading a text file

If lines are short (less than the 244 string characters), one can use insheet. This command will read the text file into Stata's memory.

. insheet using toto.txt, clear
↑Jump back a section

String functions

First have a look at the list of string functions already included in Stata.

. h string functions
↑Jump back a section

Regular Expressions

Stata includes commands for regular expressions regexm(), regexr() and regexs().

↑Jump back a section

Wordscores

Ken Benoit, Michael Laver and Will Lowe have developed wordscores, a set of Stata command which read textfiles, count the frequency of each word and compute some index of similarity between texts.

↑Jump back a section
Last modified on 30 December 2010, at 05:27