Study on the Development and Implementation of Different Big Data Clustering Methods

Study on the Development and Implementation of Different Big Data Clustering Methods

下载PDF

导出

摘要 Clustering is an unsupervised learning method used to organize raw data in such a way that those with the same (similar) characteristics are found in the same class and those that are dissimilar are found in different classes. In this day and age, the very rapid increase in the amount of data being produced brings new challenges in the analysis and storage of this data. Recently, there is a growing interest in key areas such as real-time data mining, which reveal an urgent need to process very large data under strict performance constraints. The objective of this paper is to survey four algorithms including K-Means algorithm, FCM algorithm, EM algorithm and BIRCH, used for data clustering and then show their strengths and weaknesses. Another task is to compare the results obtained by applying each of these algorithms to the same data and to give a conclusion based on these results. Clustering is an unsupervised learning method used to organize raw data in such a way that those with the same (similar) characteristics are found in the same class and those that are dissimilar are found in different classes. In this day and age, the very rapid increase in the amount of data being produced brings new challenges in the analysis and storage of this data. Recently, there is a growing interest in key areas such as real-time data mining, which reveal an urgent need to process very large data under strict performance constraints. The objective of this paper is to survey four algorithms including K-Means algorithm, FCM algorithm, EM algorithm and BIRCH, used for data clustering and then show their strengths and weaknesses. Another task is to compare the results obtained by applying each of these algorithms to the same data and to give a conclusion based on these results.

作者 Jean Pierre Ntayagabiri Jérémie Ndikumagenge Longin Ndayisaba Boribo Kikunda Philippe Jean Pierre Ntayagabiri;Jérémie Ndikumagenge;Longin Ndayisaba;Boribo Kikunda Philippe(Center of Research in Infrastructure, Environment and Technology (CRIET), University of Burundi, Bujumbura, Burundi;Catholic University of Bukavu, Bukavu, Democratic Republic of the Congo)

机构地区 Center of Research in Infrastructure Catholic University of Bukavu

出处《Open Journal of Applied Sciences》 2023年第7期1163-1177,共15页 应用科学（英文）

关键词 CLUSTERING K-MEANS Fuzzy c-Means Expectation Maximization BIRCH Clustering K-Means Fuzzy c-Means Expectation Maximization BIRCH

分类号 TP3 [自动化与计算机技术—计算机科学与技术]

引文网络
相关文献

1Yahan Deng,Zhongkai Mo,Hongqian Lu.Robust H_(∞)state estimation for a class of complex networks with dynamic event-triggered scheme against hybrid attacks[J].Chinese Physics B,2022,31(2):269-277.
2Chao Ding,Qing Shen.How to get high-efficiency lead chalcogenide quantum dot solar cells?[J].Science China(Physics,Mechanics & Astronomy),2023,66(1):34-59. 被引量：2
3Xiaoning Shi,Di Zhou,Zhigang Zhou.Reinforcement⁃Learning⁃Based Appointed⁃Time Prescribed Performance Attitude Control for Rigid Spacecraft[J].Journal of Harbin Institute of Technology(New Series),2023,30(1):13-23.
4Ayman Altameem,Ramesh Chandra Poonia,Ankit Kumar,Linesh Raja,Abdul Khader Jilani Saudagar.P-ROCK: A Sustainable Clustering Algorithm for Large Categorical Datasets[J].Intelligent Automation & Soft Computing,2023(1):553-566.
5Farzaneh Khorasani,Morteza Mohammadi Zanjireh,Mahdi Bahaghighat,Qin Xin.A Tradeoff Between Accuracy and Speed for K-Means Seed Determination[J].Computer Systems Science & Engineering,2022,40(3):1085-1098.
6邓小玉,王向兵,曹华珍,王流火,严洪峰,王宏宇.基于流聚类的PMU异常数据辨识算法[J].电力工程技术,2023,42(4):167-174. 被引量：2
7杨有慧,董申颂,陈明媛,庞壮,覃芳璐.基于TPE-BIRCH的电网安全隐患分类方法[J].广西电力,2022,45(6):57-63.
8XIAO Guangde,WANG Zhehe.Digital Transformation in Higher Education:Key Areas,Content Structures,and Practice Paths[J].Frontiers of Education in China,2022,17(4):557-580.
9Reginald S. Fletcher.Machine Learning Mapping of Soil Apparent Electrical Conductivity on a Research Farm in Mississippi[J].Agricultural Sciences,2023,14(7):915-924.
10Panpan Jiao,Lei Yang,Xiaodong Nie,Zhongwu Li,Lin Liu,Peng Zheng.Dependence of cumulative CO_(2) emission and microbial diversity on the wetting intensity in drying-rewetting cycles in agriculture soil on the Loess Plateau[J].Soil Ecology Letters,2023,5(2):65-76.

Open Journal of Applied Sciences

2023年第7期

浏览历史

内容加载中请稍等...

Study on the Development and Implementation of Different Big Data Clustering Methods

相关作者

相关机构

相关主题

浏览历史