摘要
朴素贝叶斯分类方法由于其简单快速的特点,被广泛应用于文本分类领域。但是当训练集中各个类别的样本数据分布不均匀时,朴素贝叶斯方法分类精度不太理想。针对此问题,提出一种基于加权补集的朴素贝叶斯文本分类算法,该算法利用某个类别的补集的特征来表示当前类别的特征,且对特征权重进行归一化处理。通过实验对比了该方法与传统的朴素贝叶斯方法对文本分类效果的影响,结果表明,基于加权补集的朴素贝叶斯算法具有较好的文本分类效果。
Naive Bayes classification method is widely used in text classification filed because of its simple and fast characteristics.Butwhen the distribution of sample data in training set of each class is uneven,the classification accuracy of Naive Bayes classifier is less thanideal.To solve this problem,we propose a weighted complement-based Naive Bayes text classification algorithm.It uses the features ofcomplementary set of a certain category to represent the features of current categories,and normalises the weight of the features.Comparisonhas been made through experiment in regard to the influence on the effect of text classification by this method and by traditional Naive Bayesmethod,the experimental results show that the weighted complement-based Naive Bayes algorithm has better text classification effect.
出处
《计算机应用与软件》
CSCD
北大核心
2014年第9期253-255,共3页
Computer Applications and Software
基金
浙江省自然科学基金项目(Y12F020128)
关键词
文本分类
朴素贝叶斯
补集
权重
Text classification
Naive Bayes
Complement
Weight