Data Mining Algorithms In R/Clustering/CLARA

< Data Mining Algorithms In R‎ | Clustering

An obvious way of clustering larger datasets is to try and extend existing methods so that they can cope with a larger number of objects. The focus is on clustering large numbers of objects rather than a small number of objects in high dimensions. Kaufman and Rousseeuw (1990) suggested the CLARA (Clustering for Large Applications) algorithm for tackling large applications. CLARA extends their k-medoids approach for a large number of objects. It works by clustering a sample from the dataset and then assigns all objects in the dataset to these clusters.

Technique To Be Discussed

This work is focused on CLARA, a technique for clustering largers datasets.

Contents

AlgorithmEdit

Symbols Definitions
D Data set to be clustered
n Number of objects in D
O_i Object i in D
k Number of clusters
S A sample of D
s Size of S

Table 1: Summary of symbols and definitions

CLARA (CLustering LARge Applications) relies on the sampling approach to handle large data sets. Instead of finding medoids for the entire data set, CLARA draws a small sample from the data set and applies the PAM algorithm to generate an optimal set of medoids for the sample. The quality of resulting medoids is measured by the average dissimilarity between every object in the entire data set D and the medoid of its cluster, defined as the following cost function:

 

where M is a set of selected medoids, dissimilarity(Oi, Oj) is the dissimilarity between objects Oi and Oj, and rep(M, Oi) returns a medoid in M which is closest to Oi.

To alleviate sampling bias, CLARA repeats the sampling and clustering process a pre-defined number of times and subsequently selects as the final clustering result the set of medoids with the minimal cost. Assume q to be the number of samplings. The CLARA algorithm is detailed in Figure 1.

File:Algorithm CLARA.png
Figure 1: Clara Algorithm

Since CLARA adopts a sampling approach, the quality of its clustering results depends greatly on the size of the sample. When the sample size is small, CLARA’s efficiency in clustering large data sets comes at the cost of clustering quality.

ImplementationEdit

In order to use the CLARA algorithm in R, one must install cluster package. This package includes a function that performs the CLARA process.

Install cluster package

install.packages("cluster")

Import Contents

library("cluster")

The CLARA function, provided by the cluster package, might be used as follow:

clara(x, k, metric = "euclidean", stand = FALSE, samples = 5, sampsize = min(n, 40 + 2 * k), trace = 0, medoids.x = TRUE, 
keep.data = medoids.x, rngR = FALSE)

where the arguments are:

  • x: Data matrix or data frame, each row corresponds to an observation, and each column corresponds to a variable. All variables must be numeric. Missing values (NAs) are allowed.
  • k: Integer, the number of clusters. It is required that 0 < k < n where n is the number of observations (i.e., n = nrow(x)).
  • metric: Character string specifying the metric to be used for calculating dissimilarities between observations. The currently available options are "euclidean" and "manhattan". Euclidean distances are root sum-of-squares of differences, and manhattan distances are the sum of absolute differences.
  • stand: Logical, indicating if the measurements in x are standardized before calculating the dissimilarities. Measurements are standardized for each variable (column), by subtracting the variable's mean value and dividing by the variable's mean absolute deviation.
  • samples: Integer, number of samples to be drawn from the dataset.
  • sampsize: Integer, number of observations in each sample. sampsize should be higher than the number of clusters (k) and at most the number of observations (n = nrow(x)).
  • trace: Integer indicating a trace level for diagnostic output during the algorithm.
  • medoids.x: Logical indicating if the medoids should be returned, identically to some rows of the input data x. If FALSE, keep.data must be false as well, and the medoid indices, i.e., row numbers of the medoids will still be returned (i.med component), and the algorithm saves space by needing one copy less of x.
  • keep.data: Logical indicating if the (scaled if stand is true) data should be kept in the result. Setting this to FALSE saves memory (and hence time), but disables clusplot()ing of the result. Use medoids.x = FALSE to save even more memory.
  • rngR: Logical indicating if R's random number generator should be used instead of the primitive clara()-builtin one. If true, this also means that each call to clara() returns a different result – though only slightly different in good situations.

ViewEdit

There are actually two ways of viewing the result of a CLARA use. Both of them use the object of class clara returned by the function application.

The first way is to plot the object, creating a chart that represents the data. Thus, if there are N objects divided into K clusters, the chart must contain N points representing the objects, and those points must be colored in K different colors, each one representing a cluster set. For example, given the object clarax, which is a result of the function clara application, all one has to do in order to plot the object is:

plot(clarax)

The second way of viewing the result of a CLARA application is to simply print the components of the object of class clara. For example, given the same object clarax of the previous example, one could print its components using:

print(clarax)

Example

Suppose we have 500 objects and each object have two attributes (or features). Our goal is to group these objects into K=2 groups based on their two features. The function CLARA can be used to define the groups as follow:

## generate 500 objects, divided into 2 clusters.
x <- rbind(cbind(rnorm(200,0,8), rnorm(200,0,8)), cbind(rnorm(300,50,8), rnorm(300,50,8)))
## run clara
clarax <- clara(x, 2)
clarax
clarax$clusinfo

## print components of clarax
print(clarax)

## plot clusters
plot(x, col = clarax$cluster)
## plot centers
points(clarax$centers, col = 1:2, pch = 8)

Result of printing components of clarax:

Call:    clara(x = x, k = 2) 
Medoids:
          [,1]       [,2]
[1,]  1.091033 -0.5367556
[2,] 51.044099 51.0638017
Objective function:      9.946085
Clustering vector:       int [1:500] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...
Cluster sizes:           200 300 
Best sample:
 [1]   6  45  51  56  67  75  85  90  94  97 110 111 160 170 176 181 201 219 249
[20] 260 264 275 296 304 317 319 337 340 361 362 369 370 374 379 397 398 411 420
[39] 422 424 436 448 465 489

Available components:
 [1] "sample"     "medoids"    "i.med"      "clustering" "objective" 
 [6] "clusinfo"   "diss"       "call"       "silinfo"    "data" 

Result of plotting "clarax"


Figure 2: Result of plotting clarax

Case studyEdit

In this section, we illustrate a case study using CLARA.

ScenarioEdit

This data set contains statistics, in arrests per 100,000 residents for assault, murder, and rape in each of the 50 US states in 1973. Also given is the percent of the population living in urban areas.

Input DataEdit

Case Name of Occupant Name of owner of property OCC Stat Street Name BLD MAT BLD STR DMA DMB SQ FT TOT SQ FT Addition MAT Addition STR Addition DMA Addition DMB Addition SQ FT Addition TOT SQ FT TOT # of outbuildings Property size (acres) Property size (sq ft) Property value Kitchen BLD MAT KBLD STR KDMA KDMB KSQ FT K TOT SQ FT Additional notes 1 Joseph Wiatt Young (Bishop of Methodist Church) T Green St. Frame 2.0 18 16 288 576 1 2 John Brice O Brick 2.0 40 34 1360 2720 2 Brick 14 14 196 3 Gareth Davies John Brice T Brick 2.0 37 21 777 1554 Brick 1.0 18 14 252 252 4 Susanna Bresson O Corn Hill St. Brick 2.0 40 34 1360 2720 2 5 George Barber O Prince George St. Frame 1 6 James Brice O Brick 14 1 Brick 38 16 608 2 Brick Passages 7 Philip Koy James B. Bordley T Brick 38 16 608 4 2 Wings 8 9 Thomas Callahan T Corn Hill St. Brick 1.5 35 22 770 1155 1 3200 $800 10 Thomas Callahan T Brick 2.0 35 32 1120 2240 5 0.50 $1,300 Brick 1.0 24 20 480 480 11 Brick 2.0 34 25 850 1700 1 0.25 $800 12 Charles Carroll of Carrollton T Middle Neck 100 Brick 1.0 28 28 784 784 4 2.00 $1,100 13 William Cos T Southeast St. Brick/Frame 2.0 30 28 840 1680 2 720 $600 14 Mary Rawlings and Robert Miles William Cos T Green St. 20 24 480 0 1000 $200 Bad Repair 15 Charles Carroll of Carrollton T Brick 2.0 100 34 3400 6800 Frame 2.0 52 22 1144 2288 6 2.00 $2,900 16 Charles Carroll of Carrollton T Brick 4.0 40 24 960 3840 0 $150 Same Lot As Case 15 17 James Dawson Charles Carroll of Carrollton T Frame 2.0 32 16 512 1024 0 2400 $150 Bad Repair 18 Richardson Charles Carroll of Carrollton T Frame 1.0 16 20 320 320 0 750 $150 19 William Reynolds Charles Carroll of Carrollton T Frame 2.0 16 20 320 640 0 750 $150 20 Ann Townshend Charles Carroll of Carrollton T Frame 1.0 32 16 512 512 0 2100 $150 21 Grace Peterson Charles Carroll of Carrollton T Frame 2.0 24 14 336 672 0 1200 $150 22 Catherine Proud Charles Carroll of Carrollton T Frame 2.0 18 14 252 504 0 1200 $150 23 Richard Owens Richard Chisholm T School St. Frame 1.0 32 20 640 640 0 0.25 $500 24 Edward Roberts Richard Chisholm T Church St. Frame 1.0 46 16 736 736 1 2400 $100.10 Brick 1.0 18 20 360 360 Kitchen in Bad Repair 25 Stephen Clark O Church St. Frame 1.0 20 18 360 360 Brick 1.0 20 20 400 400 0 4000 $700 26 Frank Green James Carroll T Church St. Frame 2.0 40 18 720 1440 Frame 2.0 46 16 736 1472 3 0.50 $1,300 Brick 1.0 36 26 936 936 Frame/Back Wing 27 Frank Green Nickolas Carroll T Middle Neck 100 Frame 1.0 20 16 320 320 4 2.00 $400 Brick 1.0 20 16 320 320 28 Frank Green Nicholas Carroll T Frame 1.0 40 32 1280 1280 3 2.00 $1,650 Brick 1.0 24 16 384 384 29 John Bond and Thomas Browse Nicholas Carroll T Church and Green Sts. Frame/Brick 1.0 64 32 2048 2048 Frame 1.0 16 12 192 192 0 0.25 $200 2 Frame Wing, Bad Repair 30 rented William Cooke T Brick 2.0 40 32 1280 2560 0 0.50 $200 31 Francis Clements O Middle Neck 100 Stone 1.0 30 18 540 540 1 2.00 $250 Frame 16 16 256 256 Kitchen in Bad Repair 32 Francis Clements O Church St. Brick 2.0 30 20 600 1200 1 800 $1,300 Brick 1.0 16 16 256 256 33 rented Francis Clements T Church Circle Brick 3.0 40 30 1200 3600 0 2000 $1,500 34 rented Francis Clements T Southeast St. Brick 2.0 22 22 484 968 1 720 $500 Brick 1.0 12 12 144 144 35 John Hyde Francis Clements T Frame 1.0 20 16 320 320 1 1.00 $400 36 Alex Harmon William Campbell T Church Circle Stone 2.0 45 36 1620 3240 Brick 2.0 36 20 720 1440 1 1.00 $2,200 37 William Fowler T Frame 1.0 28 16 448 448 0 205 $100.10 38 Robert Denny Edward Calvert T Brick 2.0 40 20 800 1600 2 0.50 $1,000 Brick shed 39 Jeremiah Chase O Brick 2.0 60 20 1200 2400 3 2.00 $1,450 Brick 1.0 24 16 384 384 40 Samuel Codr“ Jeremiah Chase T Frame 1.0 20 12 240 240 0 0.50 $100.10 41 Middle Neck 100 Frame 2.0 45 20 900 1800 Brick 2.0 20 12 240 480 2 2.00 $1,000 Brick 1.0 20 12 240 240 42 Richard Dorsey O Middle Neck 100 Frame 1.0 20 16 320 320 6 2.00 $300 Log 12 10 120 120 43 Hill, Richard Dorsey O Middle Neck 100 Frame 22 16 352 2 2.00 $150 44 Davidson O Corn Hill St. Frame 1.0 28 16 448 448 1 1800 $400 Frame 1.0 18 16 288 288 45 Patrick Dunn O Green St. Frame 2.0 28 30 840 1680 1 1800 $150 46 Rebecca Dullaney O Brick/Stone/Plaster 60 40 2400 2 0.50 $500 Brick 40 24 960 Stable in Bad Repair 47 Edward Holland Margaret Davis T Frame 1.0 40 14 560 560 1 0.75 $400 Brick 20 16 320 Kitchen in Bad Repair 48 Davidson O Brick 40 40 1600 2 1.00 $1,200 49 Ann Gaton Elizabeth Dawson T Frame 1.0 28 24 672 672 0 0.25 $100.10 Bad Repair 50 Mary Dullaney O Brick 2.0 60 Brick 1.0 20 16 320 320 1 2.00 $1,300 2 Wings 51 Davidson O Near Church St. Brick 1.0 30 24 720 720 2 0.50 $1,200 52 Dewalt O Brick 2.0 45 28 1260 2520 Brick 1.0 20 30 600 600 1 2.00 $1,800 Passage (2) 18 X 12, 2 Wings 53 Thomas Earle O Frame 1.0 20 16 320 320 0 0.25 $100.25 54 William Fa“ O West St. Brick 1.0 18 18 324 324 Brick 18 20 360 2 0.25 $800 Bad repair, 2 Brick Additions 55 Richard Frazier O Southeast St. Frame 1.0 20 20 400 400 1 1200 $150 Frame 1.0 16 12 192 192 Kitchen in Bad Repair 56 Richard Frazier O Southeast St. Frame 2.0 36 36 1296 2592 1 1200 $150 Brick 1.0 20 16 320 320 Kitchen in Bad Repair, Unoccupied 57 Richard Frazier O Church St. Brick/Frame 1.0 48 32 1536 1536 0 3000 $400 Unoccupied 58 Robert Koy and William Bishop Richard Frazier T Frame 1.0 32 18 576 576 0 0.25 $200 59 Richard Frazier O Frame 1.0 20 16 320 320 0 0.25 $200 Bad Repair, Occupied by Several Blacks 60 William Glover O Middle Neck 100 Frame 1.0 34 16 544 544 4 2.00 $600 Unfinished 61 William James William Glover T Middle Neck 100 Frame 1.0 24 16 384 384 3 2.00 $150 Log 14 12 168 62 Grammar O Southeast St. Frame 1.0 30 16 480 480 3 6240 $800 Frame 18 15 270 Br/Fr Stable in Bad Repair 63 Lewis Noth Grammar T Corner of Green St. Brick 3.0 32 30 960 2880 1 3600 $1,000 Brick 16 14 224 64 Sam Goodman O Frame 1.0 58 38 2204 2204 4 $1,000 Brick 1.0 28 16 448 448 Adjoining Kitchen, Fr Stable in Bad Repair 65 Richard Golden O Frame 40 20 800 0 0.25 $100.10 Very Bad Repair 66 Richard Golden O West St. Brick 2.0 26 28 728 1456 0 0.25 $200 67 Richard Golden O West St. Frame 1.0 18 16 288 288 0 0.25 $100.10 68 Richard Golden O West St. Frame 2.0 35 20 700 1400 Frame 1.0 22 20 440 440 6 1.00 $400 Frame 10 14 140 2nd Fr Addition, All in Bad Repair 69 Ghioslin O West St. Brick 2.0 38 32 1216 2432 1 0.50 $800 70 Ann Gaitlier O Church St. Frame 24 12 288 0 0.25 $100.10 71 John Wheeler Ann Gaitlier T Church St. Frame 24 12 288 1 0.25 $100.10 72 Elizabeth Now John Gwinn T Southeast St. Frame 2.0 16 12 192 384 1 2.00 $250 73 Elizabeth Thompson John Galloway T Frame 1.0 24 18 432 432 1 2.00 $300 74 Corn Hill St. Frame 1.0 38 14 532 532 0 2400 $150 75 Sam Hulton O Corn Hill St. Frame 2.0 30 20 600 1200 0 3150 $200 Unfinished 76 Thomas Hammond O Fronting Dock Brick 3.0 24 38 912 2736 2 2688 $1,200 1.0 16 20 320 320 77 Thompson Thomas Hammond T Frame 1.0 20 16 320 320 0 0.50 $100.10 Bad Repair 78 William Phelps Hammond T Frame 1.0 36 20 720 720 3 1.00 $250 Frame 36 18 648 All in Bad Repair 79 Phillip Hammond O Frame 1.0 42 16 672 672 2 0.50 $100.10 Frame 1.0 20 18 360 360 All in Bad Repair 80 William Hammond O Middle Neck 100 Brick 2.0 46 36 1656 3312 3 2.00 $1,200 Brick 1.0 20 20 400 400 81 Thomas W. Hewitt O Frame 2.0 16 16 256 512 0 2000 $100.10 82 Eloanor Hewitt O Frame 1.0 18 16 288 288 0 1200 $100.10 83 Christopher Hohne O Corn Hill St. Frame 1.0 18 16 288 288 0 1228 $150 84 Baruch Fowler Phillip Hammond T Middle Neck 100 Brick 2.0 20 30 600 1200 3 2.00 $300 Brick 1.0 18 20 360 360 Bad Repair 85 Nicholas Baldwin Phillip Hammond T Middle Neck 100 Log 16 24 384 1 2.00 $100.10 Log 18 20 360 Kitchen in Bad Repair 86 Catherine Lewis Phillip Hammond T Middle Neck 100 Frame 1.0 20 16 320 320 1 2.00 $100.10 Log 16 12 192 87 Nicholas Brewer Samuel Howard T Southeast St. Frame 28 20 560 1 0.25 $250 Brick 16 16 256 Bad Repair 88 Hall O Brick 1.0 28 14 392 392 2 0.50 $600 Frame 20 16 320 89 Holland O Frame 2.0 20 16 320 640 1 1598 $150 90 John Hurst O Near Market St. Brick/Frame 2.0 48 16 768 1536 Frame/Brick 1.0 40 18 720 720 0 1640 $800 91 John Hurst O Southeast St. Frame 2.0 24 20 480 960 1 3000 $250 Bad Repair, Unoccupied 92 Jacob Forty John Hurst T Green St. Frame 1.0 18 20 360 360 0 1400 $125 93 Isaac Henson John Hurst T Green St. Frame 1.0 16 12 192 192 0 1400 $125 94 Joseph Phelps John Hurst T Church St. Frame 1.0 26 20 520 520 1 1010 $250 Frame 12 10 120 95 Jacob Forty John Hurst T Church St. Frame 1.0 12 12 144 144 0 144 $100.10 96 Ann Wishan Grace Higgins T Frame 1.0 16 20 320 320 Brick 1.0 16 16 256 256 0 600 $250 97 Samuel Peats Grace Higgins T Frame 1.0 30 28 840 840 1 1200 $250 98 John Hyde O Brick 2.0 28 20 560 1120 Brick 1.0 16 12 192 192 2 0.25 $600 Addition is Brick Elbow 99 Mary Hosshin O Middle Neck 100 Brick 2.0 40 20 800 1600 3 2.00 $1,000 Brick 1.0 20 16 320 320 100 James P. Maynard Harrison and Randolph Latimore T Church St. Brick 2.0 20 20 400 800 Brick 2.0 20 16 320 640 0 7600 $650 101 Harrison and Latimore O Church St. Brick 2.0 40 30 1200 2400 1 7600 $650 Brick 1.0 20 16 320 320 Unoccupied 102 Samuel Howard Harrison and Latimore T Church St. Brick 2.0 54 20 1080 2160 1 7600 $650 Brick 1.0 24 18 432 432 103 Abraham Ridgely Richard Johns T Fronting Dock Brick 3.0 26 30 780 2340 Brick 3.0 18 18 324 972 1 1200 $1,000 104 James Johnson O School St. Brick 2.0 34 28 952 1904 0 0.50 $800 105 Johnson O Church St. Frame 1.0 30 12 360 360 Brick 1.0 1 0.50 $400 Addition is Brick Elbow, Bad Repair 106 Henry Stier Thomas Jennings (Heirs) T Brick 2.0 48 44 2112 4224 2 0.50 $1,800 1.0 32 16 512 512 107 Janis O Frame 1.0 26 14 364 364 0 0.50 $200 108 Corner Mills Sarah Jones T Brick 1.0 30 30 900 900 1 0.25 $300 Brick 1.0 30 14 420 420 109 Samuel Chase Kerr T Green St. Frame 1.0 24 22 528 528 0 5000 $250 110 Nicholas Bro“ Nicholas Limgen T Church St. Brick 2.0 24 36 864 1728 1 1344 $700 Brick 1.0 24 14 336 336 111 Joseph Leonard O Middle Neck 100 Frame 1.0 36 24 864 864 3 2.00 $700 Frame 20 16 320 112 Elizabeth Lloyd O Brick 3.0 62 44 2728 8184 36 20 720 1 2.00 $2,500 2 Wings 113 Richard Machabin O Corn Hill St. Brick 2.0 22 34 748 1496 1 3536 $700 Brick 1.0 16 14 224 224 114 Elizabeth McCubbin O Prince George St. Brick 2.0 50 20 1000 2000 1 1.00 $800 115 Ann Meade O Frame 1.0 20 16 320 320 0 0.25 $100.10 Bad Repair 116 James Monroe O Corn Hill St. Frame 1.0 28 16 448 448 1 1800 $150 117 James Machubin O Corn Hill St. (Fronting Dock) Brick 3.0 68 25.5 1734 5202 0 1734 $1,000 118 Cornelius Mills O Frame 1.0 24 16 384 384 Frame 2.0 24 18 432 864 0 2.00 $250 Never Finished 119 Cornelius Mills O Green St. Frame 2.0 24 30 720 1440 0 6800 $250 Unoccupied, Unfinished 120 William McCubbin O Frame 2.0 24 20 480 960 1 0.25 $150 Unoccupied 121 Thomas McNior O Church St. Frame 2.0 25 18 450 900 2 1410 $300 Frame 18 12 216 122 Mary Mainard O Frame 1.0 24 20 480 480 0 2400 $150 123 Gilbert Murdoch O Middle Neck 100 Frame 2.0 20 16 320 640 1 2.00 $100.20 124 Mary Mann O Brick 3.0 56 28 1568 4704 0 0.50 $1,000 125 Joseph Wharfe Mary Mann T Brick 2.0 80 32 2560 5120 Brick 2.0 20 16 320 640 2 0.50 $1,200 Brick 1.0 56 20 1120 1120 1 Wing 126 Ebenezer Leach Mary Mann T Frame 1.0 24 18 432 432 0 0.50 $250 127 Moses McCubbin O Frame 1.0 20 16 320 320 0 2250 $200 128 Charles Francis A. McCubbin T Church St. Brick 2.0 18 32 576 1152 1 1260 $800 Brick 1.0 18 16 288 288 129 William Brewer Charles McCubbin T Church St. Brick 2.0 32 36 1152 2304 1 2240 $800 Brick 1.0 18 14 252 252 130 Charles McCubbin O Brick 2.0 24 36 864 1728 1 1344 $800 Brick 1.0 24 14 336 336 131 Mayberry O Corn Hill St. Frame 2.0 18 16 288 576 1 1228 $150 Frame 16 12 192 132 Dr. James Murray O Prince George St. Brick 2.0 30 32 960 1920 3 0.75 $1,200 Brick 1.0 32 16 512 512 133 James Clary Nicholas McCubbin T Prince George St. Brick 2.0 32 16 512 1024 0 7056 $500 Bad Repair 134 Thomas Edgar Louis Noth T Frame 1.0 28 30 840 840 0 1740 $350 135 Ann Ogle O Brick 2.0 50 36 1800 3600 2 0.50 $1,400 Brick 1.0 24 24 576 576 136 Ann Jackson and Henry Bahre Benjamin Oden T Brick 2.0 60 22 1320 2640 2 0.25 $175 Brick 1.0 24 18 432 432 All in Bad Repair 137 William Morgen Benjamin Oden T Frame 1.0 24 16 384 384 1 0.25 $150 Bad Repair 138 Edward Poole Benjamin Oden T Frame 1.0 24 16 384 384 0 0.25 $225 139 Onion O Corn Hill St. Brick 2.0 36 26 936 1872 5 0.25 $900 All Out Of Repair 140 Benjamin Ogle, Esq. O Brick 2.0 38 28 1064 2128 3 1.00 $1,800 Brick 1.0 16 32 512 512 141 Benjamin Ogle, Esq. O Tally's Pr - Middle Neck 100 Frame 1.0 40 20 800 800 1 2.00 $300 142 Benjamin Ogle, Esq. O Horn Pr - Middle Neck 100 Frame 1.0 50 18 900 900 1 2.00 $300 143 Smith Price O Frame 1.0 16 12 192 192 2 1.00 $100.10 Log 16 12 192 144 William Prout O Frame 1.0 20 16 320 320 1 0.25 $100.10 Log House Very Bad Repair 145 James Lusby, Battee, Susanna Lusby Margaret Pryse T Corn Hill St. Frame 2.0 20 24 480 960 1 2112 $300 Frame 12 10 120 Kitchen in Bad Repair 146 Edward Pryse O Church St. Frame 1.0 20 16 320 320 1 7200 $100.10 Frame 12 10 120 147 Margaret Pinkney O Middle Neck 100 Frame 1.0 30 20 600 600 2 $150 Frame 18 12 216 All in Bad Repair 148 William Alexander Lewis Pascault T Corn Hill St. Frame 1.0 24 20 480 480 0 1200 $150 149 Richard Harwood Allen Quynn, William Goldsmith T Southeast St. Brick 2.0 36 36 1296 2592 1 1800 $400 150 Francis Brice Allen Quynn T West St. Brick 3.0 30 30 900 2700 0 0.25 $600 151 Allen Quynn O Brick 1.0 40 20 800 800 0 0.25 $400 152 Jane Howard Allen Quynn T Frame 1.0 16 24 384 384 0 0.25 $200 153 Richard Bridgeley Allen Quynn T West St. Brick 2.0 50 30 1500 3000 0 0.25 $600 154 Thomas Wilmer Allen Quynn T West St. Frame 1.0 20 16 320 320 0 0.25 $200 155 Samuel Iiams Allen Quynn T West St. Frame 1.0 40 16 640 640 1 0.25 $400 156 Gideon White Allen Quynn T Church St. Brick 2.0 23 30 690 1380 0 0.25 $400 157 Allen Quynn O Middle Neck 100 Brick 2.0 36 16 576 1152 2 Frame 1.0 16 12 192 192 158 James Boyston O Middle Neck 100 Frame 1.0 40 16 640 640 2 $300 Frame 16 12 192 Stable in Bad Repair 159 James Randall O Near Dock Brick 2.0 50 30 1500 3000 6 5000 $1,200 Frame 1.0 30 16 480 480 160 Hugh McGuire Abraham Ridgely T Corn Hill St. Brick/Frame 2.0 50 16 800 1600 2 2400 $600 161 James Barber Abraham Ridgely T Fleet St. Brick 2.0 16 24 384 768 1 2400 $350 Stable in Bad Repair 162 John Smith Abraham Ridgely T Prince George St. Brick 1.0 50 16 800 800 1 1800 $150 Brick 1.0 12 14 168 168 Kitchen in Bad Repair 163 Charles Ridgely Abraham Ridgely T Prince George St. Frame 2.0 16 20 320 640 0 Bad Repair 164 James West Abraham Ridgely T Church St. Brick 2.0 50 20 1000 2000 1 3000 $800 165 Abraham Ridgely O Church St. 12 13 156 1 1000 $600 Wing 166 Simon Rotallich O Green St. Frame 1.0 28 28 784 784 1 4116 $250 167 Samuel Ridout O Brick 3.0 20 44 880 2640 2 6000 $1,000 168 Samuel Ridout O Brick 22 18 396 1 1200 $250 Brick 1.0 16 12 192 192 Kitchen in Bad Repair 169 Mary Ridout O Brick 3.0 20 44 880 2640 1 3000 $1,000 170 Horatio Ridout O Brick 3.0 20 44 880 2640 2 6000 $1,000 171 Clement Richards O Church St. Brick 1.0 30 20 600 600 0 1800 $250 172 John Puiday John Ross T Frame 1.0 20 16 320 320 0 0.25 $100.10 173 Thomas Chambers, William Ferry John Ross T Frame 1.0 20 16 320 320 14 12 168 0 0.25 $100.10 2 Wings in Bad Repair 174 William Kibty James Rogers (Heirs) T Brick 2.0 40 30 1200 2400 3 0.25 $500 Brick 1.0 30 15 450 450 175 Lewis Duvall James H Stone T Corn Hill St. Frame 2.0 24 18 432 864 1 0.50 $200 Frame 1.0 15 12 180 180 176 James Mattison James H Stone T Church St. Brick 2.0 32 28 896 1792 1 2800 $400 Brick/Frame 1.0 28 14 392 392 177 John Gwinn James H Stone T Church St. Brick 3.0 24 32 768 2304 0 1680 $400 178 John Wolik James M Stevens T Frame 1.0 32 16 512 512 1 $150 179 Darby John H Stone T Brick 2.0 40 32 1280 2560 2 $1,000 180 John Sullivan O Frame 1.0 32 16 512 512 3 0.25 $350 Frame 10 10 100 181 Henry Williams Phillip Shwarrar T Frame 1.0 16 12 192 192 3 1200 $150 182 William Caton Charles Stewart (of Dodons House) T Church St. Brick 2.0 34 36 1224 2448 1 1904 $600 Brick 1.0 20 16 320 320 183 James Thomas Charles Stewart T Brick 1.0 50 38 1900 1900 2 0.25 $750 Brick 32 16 512 Frame Office 184 Charles Stewart O Church St. Brick 2.0 40 20 800 1600 1 2760 $600 Brick 1.0 18 12 216 216 185 Seth Sweetser O Church St. Frame 2.0 25 20 500 1000 2 1800 $400 Frame 1.0 25 10 250 250 186 Dr. James Shaaff O Corn Hill St. Frame 2.0 20 18 360 720 1 3600 $250 Frame 1.0 20 16 320 320 187 Stevens O Brick 1.0 28 16 448 448 1 $400 188 William Sewell O Frame 1.0 20 16 320 320 0 0.25 $150 189 Owen Magiath William Sewell T Frame 1.0 24 16 384 384 0 0.25 $200 190 Samuel Sands O Frame 1.0 20 16 320 320 1 1500 $150 Frame 16 12 192 191 John Shaw O Brick 1.0 50 20 1000 1000 Brick 1.0 18 12 216 216 2 6300 $750 192 Upton Scott O Brick 2.0 54 45 2430 4860 5 2.00 $1,600 193 Joseph Sands O Prince George St. Frame 1.0 40 20 800 800 1 0.25 $250 Smokehouse in Bad Repair 194 Amy and Sarah Sands O Prince George St. Frame 1.0 18 20 360 360 0 0.25 $250 195 Richard White Robert (Balt) Smith T Frame 1.0 32 18 576 576 1 0.25 $150 Brick 1.0 16 12 192 192 Kitchen in Bad Repair 196 Jaspar Tilly O Middle Neck 100 Frame 1.0 24 14 336 336 4 2.00 $150 Log 12 10 120 Dwelling Unfinished, Bad Repair 197 Eleanor Tootle O Frame 1.0 34 28 952 952 1 1200 $400 Stable in Bad Repair 198 Ann Tootle O Brick 1.0 32 16 512 512 1 1200 $400 Frame 16 12 192 199 Hugh Thompson O Middle Neck 100 Frame 2.0 38 20 760 1520 1.0 22 18 396 396 0 $500 2 Wings 200 William Wilkins O Church St. Brick 2.0 38 22 836 1672 1 3200 $600 Brick 1.0 32 18 576 576 201 William Foxcroft, Willliam B James West T Fleet St. Brick 2.0 36 24 864 1728 0 1200 $500 202 Catherine Wedon O Frame 1.5 20 16 320 480 0 0.25 $200 Unoccupied 203 Baruch Fowler Thomas Wilson T West St. Brick 2.0 24 20 480 960 1 0.25 $400 204 John Wells O Bloomsbury Square Frame 1.0 50 16 800 800 2 0.50 $400 205 Simon Rotallich, Jr. John Wells T Brick 1.0 24 16 384 384 1 0.50 $200 Brick 16 12 192 Kitchen in Bad Repair 206 Ezekiel Jacob James Welch T Corn Hill St. Frame 2.0 20 24 480 960 2 728 $450 Frame 1.0 16 10 160 160 207 Nichols Harwood James Welch T Corn Hill St. Frame 1.0 50 27 1350 1350 1 0.25 $400 Frame 26 20 520 208 Loyd Lone James Welch T Near Dock Brick 3.0 34 20 680 2040 1 2400 $600 Brick 1.0 16 16 256 256 209 Leonard Presgel James Welch T Frame 1.0 40 16 640 640 0 2400 $300 210 Burton Whotsoff O Brick 1.0 24 24 576 576 0 7200 $400 211 Richard White O Middle Neck 100 Log 16 16 256 1 2.00 $100.10 Log 14 16 224 212 George Wells William Wells T Church St. Frame 1.0 32 18 576 576 0 1371 $250 Bad Repair 213 William Whotisoft O Frame/Brick 2.0 32 24 768 1536 3 0.50 $800 Brick 1.0 16 16 256 256 Stable in Bad Repair 214 John Sullivan William Whotisoft T Brick 2.0 44 24 1056 2112 2 0.50 $600 Brick 1.0 20 16 320 320 All in Bad Repair 215 5 Blackmen William Whotisoft T Bad Row Frame 1.0 44 16 704 704 0 0.25 $100.10 216 5 Black People William Wheteroff T Red Row Frame 1.0 56 18 1008 1008 0 0.25 $100.10 Bad Repair 217 Francis Dolalandle William Wheteroff T Brick 2.0 30 30 900 1800 2 1.00 $500 Brick 1.0 16 12 192 192 Stable in Bad Repair 218 Mary Wooms O Middle Neck 100 Frame/Brick 1.0 20 16 320 320 2 2.00 $300 219 James Shaaffer Brice Worthington T Brick 2.0 40 28 1120 2240 3 0.50 $800 Brick 1.0 16 12 192 192 220 Daniel Wells, Jr. O Frame 1.0 28 20 560 560 2 0.25 $350 221 James Williams O On Dock Frame 2.0 80 25 2000 4000 4 8000 $500 Frame 2.0 33 20 660 1320 222 James Williams O Front Dock Frame 2.0 30 30 900 1800 0 2400 $100.25 223 John Gordon James Williams T Frame 2.0 30 20 600 1200 1 0.25 $150 Frame 1.0 16 12 192 192 224 Thomas Adams James Williams T Frame 30 25 750 0 0.25 $100.10 225 Robert Issabell James Williams T Fleet St. Brick 2.0 25 20 500 1000 0 720 $400 226 Grammar James Williams T Fleet St. Frame 2.0 25 16 400 800 0 720 $150 227 John Lowman James Williams T Middle Neck 100 Brick 2.0 27 27 729 1458 3 2.00 $400 Brick 1.0 21 18 378 378 228 James Williams O Fronting Dock Brick 3.0 30 30 900 2700 1 2790 $1,000 Brick 1.0 18 15 270 270 229 William Alexander James Wharfe T Frame 1.0 20 16 320 320 1 0.25 $250 Frame 1.0 16 12 192 192 230 Jonathan Wattshire Benjamin Welch T Brick 1.0 20 16 320 320 1 0.50 $200 Frame 1.0 16 12 192 192 231 Daniel Wollston O Bloomsbury Square Frame 1.0 24 20 480 480 1 2.00 $400 232 William Wells Daniel Wollston T Church St. Frame 2.0 30 18 540 1080 0 2048 $300 233 Gilbat Murdoch Daniel Wollston T Church St. Frame 2.0 32 18 576 1152 0 2048 $300 234 Thomas Buchannon J. Willmott T Church Circle Brick 3.0 33 22 726 2178 2 3200 $600 235 Andrew“ Charles Wallace T Fleet St. Frame 1.0 20 16 320 320 0 1200 $100.10 236 Charles Wallace O Middle Neck 100 Frame 1.0 32 28 896 896 2 2.00 $400 All in Bad Repair 237 Charles Wallace O Brick 2.0 50 32 1600 3200 2 2.00 $1,400 Brick 1.0 24 20 480 480 238 Francis Sherwood Charles Wallace, Eleanor Davidson T Fronting Dock Brick 3.0 28 80 2240 6720 0 2240 $1,000 Half Belongs To Estate of Mr. Davidson 239 Joseph Sands Wallace & Muir T On Dock Stone 2.0 36 40 1440 2880 0 5692 $400 240 Wallace, Johnson, and Muir O Green St. Frame 2.0 24 16 384 768 0 3750 $200 241 Samuel Shepperd Wallace, Johnson, and Muir T Corn Hill St. Frame 2.0 20 16 320 640 0 0.25 $100.10 242 William Grant Wallace, Johnson, and Muir T Corn Hill St. Frame 1.0 20 16 320 320 0 0.25 $100.10 243 Mary Wattson O Middle Neck 100 Frame 1.0 38 24 912 912 4 2.00 $250 All in Bad Repair 244 John Young O Frame 1.0 18 16 288 288 2 2.00 $400 All in Bad Repair 245 Anglin William Yieldiny T Frame 2.0 24 16 384 768 1 0.25 $150 246 James Williams O 1 0.25 $100.10 Stable

ExecutionEdit

The function "clara" was used as follows:

## import data
x <- USArrests

## run CLARA
clarax <- clara(x[1:4], 3)

## print components of clarax
print(clarax)

## plot clusters
plot(x, col = clarax$cluster)
## plot centers
points(clarax$centers, col = 1:2, pch = 8)
    1. plot(Assualt, Murder)

(USArrests) points(254,11.1, pch=16) text(254,11.11, labels ='New York') lines(Assault, (.63168 + (.04191 * Assault)))

OutputEdit

The result of printing the components of the class returned by the function application is shown below:

Call:    clara(x = x[1:4], k = 3) 
Medoids:
         Murder Assault UrbanPop Rape
Michigan   12.1     255       74 35.1
Missouri    9.0     178       70 28.2
Nebraska    4.3     102       62 16.5
Objective function:      29.31019
Clustering vector:       Named int [1:50] 1 1 1 2 1 2 3 1 1 2 3 3 1 3 3 3 3 1 ...
 - attr(*, "names")= chr [1:50] "Alabama" "Alaska" "Arizona" "Arkansas" "California" "Colorado" "Connecticut" ...
Cluster sizes:           16 14 20 
Best sample:
 [1] Alabama        Alaska         Arizona        Arkansas       California    
 [6] Colorado       Delaware       Florida        Georgia        Idaho         
[11] Illinois       Indiana        Iowa           Kansas         Kentucky      
[16] Louisiana      Maine          Maryland       Massachusetts  Michigan      
[21] Minnesota      Mississippi    Missouri       Montana        Nebraska      
[26] Nevada         New Hampshire  New York       North Carolina North Dakota  
[31] Ohio           Oklahoma       Oregon         Pennsylvania   Rhode Island  
[36] South Carolina South Dakota   Tennessee      Texas          Utah          
[41] Vermont        Virginia       Washington     West Virginia  Wisconsin     
[46] Wyoming       

Available components:
 [1] "sample"     "medoids"    "i.med"      "clustering" "objective" 
 [6] "clusinfo"   "diss"       "call"       "silinfo"    "data"  

The result of plotting the class returned by the function application it is shown below:


Figure 3: Results of the example

AnalysisEdit

The implementation of CLARA generated three clusters, relatively homogeneous, consisting of 16, 14 and 20 countries. Analyzing the cluster means, we can relate each group with each of the three classes of states:

  • The cluster formed by Alabama, Alaska, Arizona, California, Delaware, Florida, Illinois, Louisiana, Maryland, Michigan, Mississippi, Nevada, New Mexico, New York, North Carolina, South Carolina has the highest Murder, Assault and Rape arests (per 100,00) and, not least, the largest population.
  • The cluster formed by Arkansas, Colorado, Georgia, Massachusetts, Missouri, New Jersey, Oklahoma, Oregon, Rhode Island, Tennessee, Texas, Virginia, Washington, Wyoming has the intermediate Murder, Assault and Rape arests (per 100,00) and, not least, the largest population.
  • The cluster formed by Connecticut, Hawaii, Idaho, Indiana, Iowa, Kansas, Kentucky, Maine, Minnesota, Montana, Nebraska, New Hampshire, North Dakota, Ohio, Pennsylvania, South Dakota, Utah, Vermont, West Virginia, Wisconsin has the lowest Murder, Assault and Rape arests (per 100,00) and, not least, the largest population.

Analyzing, based on [3], the states of the two extreme clusters (1,3) it was possible to verify that there is a reason for each country to be in these groups. California, although has a good Human Development Index and Median Personal Earnings rate, has the 3rd biggest Unemployment Rate in the USA, the 2nd is Michigan and the 1st is Nevada, two other states that are also in the cluster one. Connecticut has the highest Human Development Index and is on the cluster three. Wyoming has the best percentage of people with High School Diploma, and is on the cluster two. Others reasons can be verified checking this work together with [3].

ReferencesEdit

  1. Chih-Ping, Wei, Yen-Hsien, Lee, and Che-Ming, Hsu, Empirical Comparison of Fast Clustering Algorithms for Large Data Sets. [1]
  2. The R Development Core Team, R: A Language and Environment for Statistical Computing. [2]
  3. American Human Development Project, Mapping the Measure of America [3]
  4. Kaufman, L., Rousseeuw, P. J., Clustering by Means of Medoids.