期刊文献+

基于改进K-means的大数据清洗方法 被引量:11

Research on Big Data Cleaning Methods Based on Improved K-means
下载PDF
导出
摘要 随着大数据时代的来临,如何提高数据质量成为提高数据利用效率的关键问题。介绍了大数据环境下的数据清洗流程,对传统的Logsf方法进行了改进,提取大数据的主要特征,实现了大数据的降维。提出了采用Canopy改进K-means的算法,实现了异常数据的快速识别。对所提出的改进的Logsf方法和传统的Logsf方法进行了仿真对比,实验结果说明改进的Logsf方法具有更高的准确率和数据处理速度。将改进的K-means算法和传统的K-means算法应用在异常数据的分析当中,实验结果验证了所提方法具有更高的准确性和处理速度。 With the advent of the era of big data,improvement of the quality of data has become a key issue to improve the efficiency of data utilization.The data cleaning process in big data environment is introduced.The traditional Logsf method is improved to extract the main features of big data and achieve dimensionality reduction of big data.An improved K-means algorithm based on Canopy was proposed to realize the rapid identification of abnormal data.Simulation comparison is made between the proposed improved Logsf method and the traditional Logsf method.Experimental results show that the improved Logsf method has higher accuracy and data processing speed.The improved K-means algorithm and the traditional K-means algorithm are applied to the analysis of abnormal data,and the experimental results verify that the proposed method has higher accuracy and processing speed.
作者 林女贵 吴元林 LIN Nügui;WU Yuanlin(State Grid Fujian Electric Power Co. Ltd., Fuzhou 350001, China;State Grid Yili Technology Co. Ltd., Fuzhou 350001, China)
出处 《微型电脑应用》 2021年第11期133-136,共4页 Microcomputer Applications
关键词 大数据 数据清洗 K-MEANS Logsf big data data cleaning K-means Logsf
  • 相关文献

参考文献14

二级参考文献187

共引文献684

同被引文献95

引证文献11

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部