摘要
Clustering is an unsupervised learning method used to organize raw data in such a way that those with the same (similar) characteristics are found in the same class and those that are dissimilar are found in different classes. In this day and age, the very rapid increase in the amount of data being produced brings new challenges in the analysis and storage of this data. Recently, there is a growing interest in key areas such as real-time data mining, which reveal an urgent need to process very large data under strict performance constraints. The objective of this paper is to survey four algorithms including K-Means algorithm, FCM algorithm, EM algorithm and BIRCH, used for data clustering and then show their strengths and weaknesses. Another task is to compare the results obtained by applying each of these algorithms to the same data and to give a conclusion based on these results.
Clustering is an unsupervised learning method used to organize raw data in such a way that those with the same (similar) characteristics are found in the same class and those that are dissimilar are found in different classes. In this day and age, the very rapid increase in the amount of data being produced brings new challenges in the analysis and storage of this data. Recently, there is a growing interest in key areas such as real-time data mining, which reveal an urgent need to process very large data under strict performance constraints. The objective of this paper is to survey four algorithms including K-Means algorithm, FCM algorithm, EM algorithm and BIRCH, used for data clustering and then show their strengths and weaknesses. Another task is to compare the results obtained by applying each of these algorithms to the same data and to give a conclusion based on these results.