期刊文献+

基于数据填补和连续属性的朴素贝叶斯算法 被引量:4

Naive Bayes based on data filling and continuous attribute
下载PDF
导出
摘要 朴素贝叶斯算法(NB)在处理分类问题时通常假设训练样本的数值型连续属性满足正态分布,其分类精度也受到训练数据完整性的影响,而实际采样数据很难满足上述要求。针对数据缺失问题,基于期望最大值算法(EM),将朴素贝叶斯分类器利用已有的不完整数据进行参数学习;针对样本数值型连续属性非正态分布的情况,基于核密度估计,利用其分布密度(Distribution Density)和新的分析计算方法来求最大后验分布,同时用标准数据集的分类实验验证了改进的有效性。将改良的算法EM-DNB应用在生物工程蛋白质纯化工艺预测中,实验结果表明,预测精度有所提高。 When dealing with classification problem, Naive Bayes(NB)usually assumes that the numerical continuous attributes follow normal distribution, the classification accuracy is also affected by the integrity of training data. But the actual sampled data are difficult to meet the above requirements. For missing data, the Naive Bayesian classifier uses existing incomplete data to implement parameter learning based on the Expectation-Maximum(EM)algorithm; for nonnormal numerical continuous attributes, distribution density based on kernel density estimation and a new method are used to calculate the maximum posterior probability, meanwhile, the classification experiment using standard data sets verifies the effectiveness of the improvement. Finally, the improved algorithm(EM-DNB)is applied to the prediction of the protein purification technologies in biological engineering. The experimental results show that the accuracy is improved.
出处 《计算机工程与应用》 CSCD 北大核心 2016年第1期133-140,共8页 Computer Engineering and Applications
基金 国家科技重大专项(No.2009ZX09306-004 No.2011ZX09101-008-09)
关键词 朴素贝叶斯(NB) 期望最大值(EM)算法 连续属性 核密度估计 蛋白质纯化 Naive Bayes(NB) Expectation-Maximum(EM)algorithm continuous attributes kernel density estimation protein purification
  • 相关文献

参考文献26

二级参考文献136

共引文献201

同被引文献44

引证文献4

二级引证文献16

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部