期刊文献+

基于集成学习的不完备数据补全算法研究 被引量:5

Research on Completion Algorithm for Incomplete Data Based on Ensemble Learning
下载PDF
导出
摘要 在数据挖掘领域中,对不完备数据进行补全,能够有效修复残缺的信息,提高挖掘效率和建模成功率。在大数据场景下,数据缺失机制的复杂性和数据的多源互补性得以呈现,以往单纯通过数据分布分析或关联分析进行分离修补的算法效果有限。文中结合数据分布和属性关联两种角度,提出一种以EM、KNN、RF等8种算法为基学习器的异质集成学习数据补全算法模型HELITW,在Iris、Boston等5种UCI机器学习标准数据集为基础,分别以10%、20%和30%比例建立的随机缺失机制数据集上,将HELITW与其他8种算法进行数据补全实验对比研究,实验结果表明:随着数据残缺比例的增加,9种模型的修补效果总体上都随之降低;但在相同实验条件下,HELITW模型补全效果优于其它8种模型。 In the field of data mining,incomplete data completing can effectively repair the incomplete information and improve the efficiency of mining and the success rate of modeling. In the big data scenario,the complexity of data missing mechanism and the multi-source complementarity of data are presented. In the past,the algorithm effect of separating and repairing by data distribution analysis or association analysis is limited. This paper proposes a heterogeneous ensembled learning data completion algorithm model HELITW based on eight algorithms of EM,KNN,RF,etc. Based on five UCI machine learning standard datasets such as Iris and Boston,HELITW is combined with other eight algorithms on the basis of 10%,20% and 30% random missing mechanism datasets respectively The experimental results show that: with the increase of the data incomplete ratio,the repair effect of nine models decreases as a whole;but under the same experimental conditions,the HELITW model is better than the other eight models.
作者 丁敬安 张欣海 胡博 周国民 DING Jing’an;ZHANG Xin-hai;HU Bo;ZHOU Guo-min(Hangzhou Sanhui Digital Information Technology Co.,Ltd.,Hangzhou 310053,China;School of Management Science and Engineering,Anhui University of Technology,Ma'anshan 243032,China;China Academy of Electronics and Information Technology,Beijing 100041,China;National Engineering Laboratory for Public Safety Risk Perception and Control by Big Data,Beijing 100041,China;Zhejiang Police College,Hangzhou 310053,China)
出处 《中国电子科学研究院学报》 北大核心 2020年第1期78-83,91,共7页 Journal of China Academy of Electronics and Information Technology
基金 “十三五”国家重点研发计划(2017YFC0820503)。
关键词 不完备数据 UCI数据集 异质集成学习 HELITW incomplete data UCI dataset heterogeneous ensembled learning HELITW
  • 相关文献

参考文献17

二级参考文献181

共引文献918

同被引文献54

引证文献5

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部