Data Mining Algorithms In R/Classification/JRip
Synopsis
editThis class implements a propositional rule learner, Repeated Incremental Pruning to Produce Error Reduction (RIPPER), which was proposed by William W. Cohen as an optimized version of IREP. It is based in association rules with reduced error pruning (REP), a very common and effective technique found in decision tree algorithms. In REP for rules algorithms, the training data is split into a growing set and a pruning set. First, an initial rule set is formed that over ts the growing set, using some heuristic method. This overlarge rule set is then repeatedly simplified by applying one of a set of pruning operators typical pruning operators would be to delete any single condition or any single rule. At each stage of simplification, the pruning operator chosen is the one that yields the greatest reduction of error on the pruning set. Simplification ends when applying any pruning operator would increase error on the pruning set.
The algorithm is briefly described as follows: Initialize RS = {}, and for each class from the less prevalent one to the more frequent one, DO:
1. Building stage:
editRepeat 1.1 and 1.2 until the description length (DL) of the rule set and examples is 64 bits greater than the smallest DL met so far, or there are no positive examples, or the error rate >= 50%.
1.1. Grow phase:
editGrow one rule by greedily adding antecedents (or conditions) to the rule until the rule is perfect (i.e. 100% accurate). The procedure tries every possible value of each attribute and selects the condition with highest information gain: p(log(p/t)-log(P/T)).
1.2. Prune phase:
editIncrementally prune each rule and allow the pruning of any final sequences of the antecedents;The pruning metric is (p-n)/(p+n) – but it's actually 2p/(p+n) -1, so in this implementation we simply use p/(p+n) (actually (p+1)/(p+n+2), thus if p+n is 0, it's 0.5).
2. Optimization stage:
editafter generating the initial rule set {Ri}, generate and prune two variants of each rule Ri from randomized data using procedure 1.1 and 1.2. But one variant is generated from an empty rule while the other is generated by greedily adding antecedents to the original rule. Moreover, the pruning metric used here is (TP+TN)/(P+N).Then the smallest possible DL for each variant and the original rule is computed. The variant with the minimal DL is selected as the final representative of Ri in the rule set. After all the rules in {Ri} have been examined and if there are still residual positives, more rules are generated based on the residual positives using Building Stage again. 3. Delete the rules from the rule set that would increase the DL of the whole rule set if it were in it. and add resultant rule set to RS. ENDDO Note that there seem to be 2 bugs in the original ripper program that would affect the rule set size and accuracy slightly. This implementation avoids these bugs and thus is a little bit different from Cohen's original implementation. Even after fixing the bugs, since the order of classes with the same frequency is not defined in ripper, there still seems to be some trivial difference between this implementation and the original ripper, especially for audiology data in UCI repository, where there are lots of classes of few instances. Details please see:
- William W. Cohen: Fast Effective Rule Induction. In: Twelfth International Conference on Machine Learning, 115-123, 1995.
Installation
editThe caret package can be installed by using the following command on R's command-line:
install.packages("caret", dependencies = TRUE)
The above command shall recursively download and install all packages that caret depend to along with fpc itself.
Example
editThe example in this section will illustrate the carets's JRip usage on the IRIS database:
>library(caret) >library(RWeka) >data(iris) >TrainData <- iris[,1:4] >TrainClasses <- iris[,5] >jripFit <- train(TrainData, TrainClasses,method = "JRip")
Study Case
editDataset
editThe Iris dataset contains 150 instances, corresponding to three equally-frequent species of iris plant (Iris setosa, Iris versicolour, and Iris virginica). An Iris versicolor is shown below, courtesy of Wikimedia Commons.
Each instance contains four attributes:sepal length in cm, sepal width in cm, petal length in cm, and petal width in cm. The next picture shows each attribute plotted against the others, with the different classes in color.
Execution and Results
editFirst of all, we need to specify which base we are going to use:
> data(iris) > summary(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300 Median :5.800 Median :3.000 Median :4.350 Median :1.300 Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800 Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500 Species setosa :50 versicolor:50 virginica :50
After that, we are ready to create a Naïve Bayes model to the dataset using the first 4 columns to predict the fifth.
>data(iris) >varIndex <- 1:numSamples > >TrainData <- iris[,1:4] >TrainClasses <- iris[,5] >jripFit <- train(TrainData, TrainClasses,method = "JRip",preProcess = c("center", "scale"),tuneLength = 10,trControl = trainControl(method = "cv"))
Output
editLoading required package: class Attaching package: 'class' The following object(s) are masked from 'package:reshape': condense Fitting: NumOpt=1 Fitting: NumOpt=2 Fitting: NumOpt=3 Fitting: NumOpt=4 Fitting: NumOpt=5 Fitting: NumOpt=6 Fitting: NumOpt=7 Fitting: NumOpt=8 Fitting: NumOpt=9 Fitting: NumOpt=10 Aggregating results Selecting tuning parameters Fitting model on full training set
Result
edit> jripFit Call: train.default(x = TrainData, y = TrainClasses, method = "JRip", preProcess = c("center", "scale"), trControl = trainControl(method = "cv"), tuneLength = 10) 150 samples 4 predictors Pre-processing: centered, scaled Resampling: Cross-Validation (10 fold) Summary of sample sizes: 135, 135, 135, 135, 135, 135, ... Resampling results across tuning parameters: NumOpt Accuracy Kappa Accuracy SD Kappa SD Selected 1 0.953 0.93 0.045 0.0675 2 0.953 0.93 0.045 0.0675 * 3 0.933 0.9 0.0444 0.0667 4 0.94 0.91 0.0584 0.0876 5 0.94 0.91 0.0584 0.0876 6 0.94 0.91 0.0584 0.0876 7 0.94 0.91 0.0584 0.0876 8 0.94 0.91 0.0584 0.0876 9 0.94 0.91 0.0584 0.0876 10 0.94 0.91 0.0584 0.0876 Accuracy was used to select the optimal model using the largest value. The final value used for the model was NumOpt = 2.
The caret package ran the training tuning the NumOpt JRip parameter from 1 to 10 and chose the best performance which is NumOpt=2 with a 95.3% accuracy. If some other algorithm was chosen, other algorithm parameter would be tuned.
If we plot the results we have a plot of the parameter choosing accuracy:
>plot(jripFit)
-
JRip accuracy vs. NumOpt parameter.
References
edit- William W. Cohen: Fast Effective Rule Induction. In: Twelfth International Conference on Machine Learning, 115-123, 1995.