![]() | Only 14 pages are availabe for public view |
Abstract Clustering is a data mining technique which helps in grouping or making clusters of data having similar values of some of the data attributes. Clustering can be used in various elds like in health sector for grouping patients with similar symptoms of the disease, in banking sector to group customers who have dues in their credit card payments, in market analysis to identify the customers having similar buying patterns. One of the famous clustering algorithm is K-means. K-means is one of the simplest unsupervised learning algorithms known for its speed and simplicity. However, this algorithm suers from two major limitations. First, the clusters produced are sensitive to the selection of initial centroids (cluster centers). Second, the algorithm requires number of clusters as input. This sometimes needs domain specic and if the user is not domain expert then many problems can be occurred. As mentioned earlier, the output of K-means algorithm highly depends upon the selection of initial cluster centers. Because initial cluster centers are chosen randomly. Consequently K-means algorithm does not guarantee |