Data Mining Algorithms In R/Packages/RWeka/Weka tokenizers

Description edit

R interfaces to Weka tokenizers.

Usage edit

AlphabeticTokenizer(x, control = NULL)

NGramTokenizer(x, control = NULL)

WordTokenizer(x, control = NULL)

Arguments edit

x, a character vector with strings to be tokenized.

control, an object of class Weka_control, or a character vector of control options, or NULL (default).

Details edit

AlphabeticTokenizer is an alphabetic string tokenizer, where tokens are to be formed only from contiguous alphabetic sequences.

NGramTokenizer splits strings into n-grams with given minimal and maximal numbers of grams.

WordTokenizers is a simple word tokenizer.

Value edit

A character vector with the tokenized strings.