An Empirical Investigation on Classical Clustering Methods
IUP Journal of Genetics & Evolution, Vol. 2, No. 3, pp. 74-79, August 2009
Posted: 11 Aug 2009
Date Written: August 10, 2009
Five classical clustering methods: four hierarchical -single linkage, average-between linkage, average-within linkage, Wards - and one non-hierarchical - k-means - using five different distance measures: squared Euclidean, city block, Chebychev’s, Pearson correlation and Minkowski have been compared on the basis of simulated multivariate data on paddy crop genotypes. The performance of different clustering methods was compared based on the average percentage probability of misclassification and its standard error. The performance of different hierarchical clustering methods varied with distance measures used and it was found that squared Euclidean performed best among the five distances followed by city block distance in majority of cases. Among the five methods, the Ward’s method performed best with least average percentage probability of misclassification followed by non-hierarchical k-means method irrespective of the sample size. Among the different distance measures used under hierarchical clustering methods, the squared Euclidean distance showed least average percentage probability of misclassification followed by city block distance.
Keywords: Cluster analysis, Rice, Hierarchical methods, Non-hierarchical method, Distance measures
Suggested Citation: Suggested Citation