Data Mining Algorithms In R/Packages/RWeka/Weka tokenizers

Description

edit

R interfaces to Weka tokenizers.

Usage

edit

AlphabeticTokenizer(x, control = NULL)

NGramTokenizer(x, control = NULL)

WordTokenizer(x, control = NULL)

Arguments

edit

x, a character vector with strings to be tokenized.

control, an object of class Weka_control, or a character vector of control options, or NULL (default).

Details

edit

AlphabeticTokenizer is an alphabetic string tokenizer, where tokens are to be formed only from contiguous alphabetic sequences.

NGramTokenizer splits strings into n-grams with given minimal and maximal numbers of grams.

WordTokenizers is a simple word tokenizer.

Value

edit

A character vector with the tokenized strings.