Data Mining Algorithms In R/Packages/RWeka/Weka clusterers

Description edit

R interfaces to Weka clustering algorithms.

Usage edit

Cobweb(x, control = NULL)

FarthestFirst(x, control = NULL)

SimpleKMeans(x, control = NULL)

XMeans(x, control = NULL)

DBScan(x, control = NULL)

Arguments edit

x, an R object with the data to be clustered.

control, an object of class Weka_control, or a character vector of control options, or NULL (default).

Details edit

There is a predict method for predicting class ids or memberships from the fitted clusterers.

Cobweb implements the Cobweb (Fisher, 1987) and Classit (Gennari et al., 1989) clustering algorithms.

FarthestFirst provides the “farthest first traversal algorithm” by Hochbaum and Shmoys, which works as a fast simple approximate clusterer modeled after simple k-means.

SimpleKMeans provides clustering with the k-means algorithm.

XMeans provides k-means extended by an “Improve-Structure part” and automatically determines the number of clusters.

DBScan provides the “density-based clustering algorithm” by Ester, Kriegel, Sander, and Xu. Note that noise points are assigned to NA.

Value edit

A list inheriting from class Weka_clusterers with components including:

clusterer, a reference (of class jobjRef) to a Java object obtained by applying the Weka buildClusterer method to the training instances using the given control options.

class_ids, a vector of integers indicating the class to which each training instance is allocated (the results of calling theWeka clusterInstance method for the built clusterer and each instance).

Example edit

   cl1 <- SimpleKMeans(iris[, -5], Weka_control(N = 3))
   table(predict(cl1), iris$Species)
   cl2 <- XMeans(iris[, -5],
   c("-L", 3, "-H", 7, "-use-kdtree", "-K", "weka.core.neighboursearch.KDTree -P"))
   table(predict(cl2), iris$Species)