Data Mining Algorithms In R/Packages/RWeka/Weka classifier trees

Description edit

R interfaces to Weka regression and classification tree learners.

Usage edit

J48(formula, data, subset, na.action, control = Weka_control(), options = NULL)

LMT(formula, data, subset, na.action, control = Weka_control(), options = NULL)

M5P(formula, data, subset, na.action, control = Weka_control(), options = NULL)

DecisionStump(formula, data, subset, na.action, control = Weka_control(), options = NULL)

Arguments edit

formula, a symbolic description of the model to be fit.

data, an optional data frame containing the variables in the model.

subset, an optional vector specifying a subset of observations to be used in the fitting process.

na.action, a function which indicates what should happen when the data contain NAs.

control, an object of class Weka_control giving options to be passed to the Weka learner.

options, a named list of further options, or NULL (default).

Details edit

There are a predict method for predicting from the fitted models, and a summary method based on evaluate_Weka_classifier. There is also a plot method for fitted binary Weka_trees via the facilities provided by package party. This converts the Weka_tree to a BinaryTree and then simply calls the plot method of this class (see plot.BinaryTree) with slight modifications to the default arguments. Provided the Weka classification tree learner implements the “Drawable” interface (i.e., provides a graph method), write_to_dot can be used to create a DOT representation of the tree for visualization via Graphviz or the Rgraphviz package.

J48 generates unpruned or pruned C4.5 decision trees (Quinlan, 1993).

LMT implements “Logistic Model Trees” (Landwehr, 2003; Landwehr et al., 2005).

M5P (where the ‘P’ stands for ‘prime’) generates M5 model trees using the M5’ algorithm, which was introduced in Wang & Witten (1997) and enhances the original M5 algorithm by Quinlan(1992).

DecisionStump implements decision stumps (trees with a single split only), which are frequently used as base learners for meta learners such as Boosting.

The model formulae should only use the ‘+’ and ‘-’ operators to indicate the variables to be included or not used, respectively. Argument options allows further customization. Currently, options model and instances (or partial matches for these) are used: if set to TRUE, the model frame or the corresponding Weka instances, respectively, are included in the fitted model object, possibly speeding up subsequent computations on the object. By default, neither is included.

Value edit

A list inheriting from classes Weka_tree and Weka_classifiers with components including:

classifier, a reference (of class jobjRef) to a Java object obtained by applying the Weka

buildClassifier, method to build the specified model using the given control options.

predictions, a numeric vector or factor with the model predictions for the training instances (the results of calling the Weka classifyInstance method for the built classifier and each instance).

call, the matched call.

Example edit

   m1 <- J48(Species ~ ., data = iris)
   summary(m1)
   table(iris$Species, predict(m1))
   if(require("party", quietly = TRUE)) plot(m1)
   write_to_dot(m1)
   library("Rgraphviz")
   ff <- tempfile()
   write_to_dot(m1, ff)
   plot(agread(ff))
   DF2 <- read.arff(system.file("arff", "contact-lenses.arff", package = "RWeka"))
   m2 <- J48(`contact-lenses` ~ ., data = DF2)
   table(DF2$`contact-lenses`, predict(m2))
   if(require("party", quietly = TRUE)) plot(m2)
   DF3 <- read.arff(system.file("arff", "cpu.arff", package = "RWeka"))
   m3 <- M5P(class ~ ., data = DF3)
   if(require("party", quietly = TRUE)) plot(m3)
   DF4 <- read.arff(system.file("arff", "weather.arff", package = "RWeka"))
   m4 <- LMT(play ~ ., data = DF4)
   table(DF4$play, predict(m4))
   if(require("mlbench", quietly = TRUE) && require("party", quietly = TRUE)) {
       data("PimaIndiansDiabetes", package = "mlbench")
       m5 <- J48(diabetes ~ ., data = PimaIndiansDiabetes, control = Weka_control(R = TRUE))
       plot(m5)
   }