摘要
因特网上的数据规模大、动态性强,通常发现的知识或规则很可能是不精确和不完备的。为了克服以上不足,引入模糊理论,通过寻找模糊相似上近似集进行合理聚类,在确定聚类数目的过程中,利用平均信息熵进行最佳聚类。同时将模糊聚类算法嵌入WEKA平台,利用WEKA中的类和可视化功能,扩充了WEKA中的聚类算法。实验表明,算法对含有噪声的、分布不规则的大数据集具有很高的精度和收敛速度。
The data in Internet has a large scale and dynamic peculiarity and the discovered knowledge or rules are likely to be imprecise or incomplete generally. Fuzzy theory and information entropy were introduced into the clustering analysis to overcome the difficulties and achieve the best results of clustering by looking for Fuzzy similarity upper approximation. The process of embedding the Fuzzy approximation algorithm into the WEKA platform in which the classes and visualization functions of open source WEKA was fully utilized. The Fuzzy approximation algorithms extended the clustering algorithm in WEKA. The experiment proves that it has a higher accuracy and convergence for the large-scale data sets that are anomalous and noise.
出处
《解放军理工大学学报(自然科学版)》
EI
北大核心
2012年第1期22-26,共5页
Journal of PLA University of Science and Technology(Natural Science Edition)
基金
国家863计划资助项目(2007AA01Z126)
关键词
模糊集
数据挖掘
模糊聚类
相似上近似
WEKA
聚类算法
fuzzy sets
data mining
fuzzy clustering
similarity upper approximation
WEKA
clustering algorithm