摘要
针对现有的 Bayesian网络学习方法都不能有效处理缺失数据问题 ,论文给出了两种处理不完整数据问题的方法 :一种方法是先把不完整的数据集修复成完整的数据集 ,利用完整的数据集进行计算 ,并将结果作为不完整数据集对应情况的近似 ;另一种方法是直接使用不完整的数据集进行近似计算 ,而这种近似计算是渐进正确的。实验结果表明前一种方法计算结果准确 ,但效率较低 ;后一种方法效率较高 ,在数据量比较大时能达到很好的效果 ;而且这两种方法的性能比其它处理缺失数据的方法效果要好。
Much of the current research in learning Bayesian networks fails to effectively deal with missing data. This paper presents two methods to account for missing data. One method recasts the incomplete data set into a complete data set and then learns Bayesian networks from the complete data set. The other learns Bayesian networks directly from the incomplete data set and this method is gradually correct. The experimental results show that the former provides accurate results, but is inefficient; while the latter is highly efficient, and can obtain good results when the data set is large. Furthermore, both methods perform better than other methods that deal with missing data.
出处
《清华大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2000年第9期65-68,共4页
Journal of Tsinghua University(Science and Technology)
基金
国家自然科学基金项目!(79990 5 80 )
国家"九七三"基础研究基金项目! (G19980 30 414)