Data Mining Algorithms In R/Classification/JRip

Synopsis

This class implements a propositional rule learner, Repeated Incremental Pruning to Produce Error Reduction (RIPPER), which was proposed by William W. Cohen as an optimized version of IREP. It is based in association rules with reduced error pruning (REP), a very common and effective technique found in decision tree algorithms. In REP for rules algorithms, the training data is split into a growing set and a pruning set. First, an initial rule set is formed that over ts the growing set, using some heuristic method. This overlarge rule set is then repeatedly simplified by applying one of a set of pruning operators typical pruning operators would be to delete any single condition or any single rule. At each stage of simplification, the pruning operator chosen is the one that yields the greatest reduction of error on the pruning set. Simplification ends when applying any pruning operator would increase error on the pruning set.

The algorithm is briefly described as follows: Initialize RS = {}, and for each class from the less prevalent one to the more frequent one, DO:

1. Building stage:

Repeat 1.1 and 1.2 until the description length (DL) of the rule set and examples is 64 bits greater than the smallest DL met so far, or there are no positive examples, or the error rate >= 50%.

1.1. Grow phase:

Grow one rule by greedily adding antecedents (or conditions) to the rule until the rule is perfect (i.e. 100% accurate). The procedure tries every possible value of each attribute and selects the condition with highest information gain: p(log(p/t)-log(P/T)).

1.2. Prune phase:

Incrementally prune each rule and allow the pruning of any final sequences of the antecedents;The pruning metric is (p-n)/(p+n) – but it's actually 2p/(p+n) -1, so in this implementation we simply use p/(p+n) (actually (p+1)/(p+n+2), thus if p+n is 0, it's 0.5).

2. Optimization stage:

after generating the initial rule set {Ri}, generate and prune two variants of each rule Ri from randomized data using procedure 1.1 and 1.2. But one variant is generated from an empty rule while the other is generated by greedily adding antecedents to the original rule. Moreover, the pruning metric used here is (TP+TN)/(P+N).Then the smallest possible DL for each variant and the original rule is computed. The variant with the minimal DL is selected as the final representative of Ri in the rule set. After all the rules in {Ri} have been examined and if there are still residual positives, more rules are generated based on the residual positives using Building Stage again. 3. Delete the rules from the rule set that would increase the DL of the whole rule set if it were in it. and add resultant rule set to RS. ENDDO Note that there seem to be 2 bugs in the original ripper program that would affect the rule set size and accuracy slightly. This implementation avoids these bugs and thus is a little bit different from Cohen's original implementation. Even after fixing the bugs, since the order of classes with the same frequency is not defined in ripper, there still seems to be some trivial difference between this implementation and the original ripper, especially for audiology data in UCI repository, where there are lots of classes of few instances. Details please see:

William W. Cohen: Fast Effective Rule Induction. In: Twelfth International Conference on Machine Learning, 115-123, 1995.

Installation

The caret package can be installed by using the following command on R's command-line:

install.packages("caret", dependencies = TRUE)

The above command shall recursively download and install all packages that caret depend to along with fpc itself.

Example

The example in this section will illustrate the carets's JRip usage on the IRIS database:

>library(caret)
>library(RWeka)
>data(iris)
>TrainData <- iris[,1:4]
>TrainClasses <- iris[,5]
>jripFit <- train(TrainData, TrainClasses,method = "JRip")

Study Case

Dataset

The Iris dataset contains 150 instances, corresponding to three equally-frequent species of iris plant (Iris setosa, Iris versicolour, and Iris virginica). An Iris versicolor is shown below, courtesy of Wikimedia Commons.

Iris versicolor

Each instance contains four attributes:sepal length in cm, sepal width in cm, petal length in cm, and petal width in cm. The next picture shows each attribute plotted against the others, with the different classes in color.

Plotting the Iris attributes

Execution and Results

First of all, we need to specify which base we are going to use:

> data(iris)
> summary(iris)
  Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
 Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
 Median :5.800   Median :3.000   Median :4.350   Median :1.300  
 Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
 3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
 Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
       Species  
 setosa    :50  
 versicolor:50  
 virginica :50

After that, we are ready to create a Naïve Bayes model to the dataset using the first 4 columns to predict the fifth.

>data(iris)
>varIndex <- 1:numSamples
>
>TrainData <- iris[,1:4]
>TrainClasses <- iris[,5]
>jripFit <- train(TrainData, TrainClasses,method = "JRip",preProcess = c("center", "scale"),tuneLength = 10,trControl = trainControl(method = "cv"))

Output

Loading required package: class

Attaching package: 'class'

The following object(s) are masked from 'package:reshape':

   condense

Fitting: NumOpt=1 
Fitting: NumOpt=2 
Fitting: NumOpt=3 
Fitting: NumOpt=4 
Fitting: NumOpt=5 
Fitting: NumOpt=6 
Fitting: NumOpt=7 
Fitting: NumOpt=8 
Fitting: NumOpt=9 
Fitting: NumOpt=10 
Aggregating results
Selecting tuning parameters
Fitting model on full training set

Result

> jripFit

Call:
train.default(x = TrainData, y = TrainClasses, method = "JRip", 
    preProcess = c("center", "scale"), trControl = trainControl(method = "cv"), 
    tuneLength = 10)

150 samples
  4 predictors

Pre-processing: centered, scaled 
Resampling: Cross-Validation (10 fold) 

Summary of sample sizes: 135, 135, 135, 135, 135, 135, ... 

Resampling results across tuning parameters:

  NumOpt  Accuracy  Kappa  Accuracy SD  Kappa SD  Selected
  1       0.953     0.93   0.045        0.0675            
  2       0.953     0.93   0.045        0.0675    *       
  3       0.933     0.9    0.0444       0.0667            
  4       0.94      0.91   0.0584       0.0876            
  5       0.94      0.91   0.0584       0.0876            
  6       0.94      0.91   0.0584       0.0876            
  7       0.94      0.91   0.0584       0.0876            
  8       0.94      0.91   0.0584       0.0876            
  9       0.94      0.91   0.0584       0.0876            
  10      0.94      0.91   0.0584       0.0876            

Accuracy was used to select the optimal model using the largest value.
The final value used for the model was NumOpt = 2.

The caret package ran the training tuning the NumOpt JRip parameter from 1 to 10 and chose the best performance which is NumOpt=2 with a 95.3% accuracy. If some other algorithm was chosen, other algorithm parameter would be tuned.

If we plot the results we have a plot of the parameter choosing accuracy:

>plot(jripFit)

JRip accuracy vs. NumOpt parameter.

References

William W. Cohen: Fast Effective Rule Induction. In: Twelfth International Conference on Machine Learning, 115-123, 1995.