摘要
网络信息浩如烟海又纷繁芜杂,从中掌握最有效的信息是信息处理的一大目标,而文本分类是组织和管理数据的有力手段。由于最大熵模型可以综合观察到的各种相关或不相关的概率知识,具有对许多问题的处理都可以达到较好的结果的优势,将最大熵模型引入到中文文本分类的研究中,并通过使用一种特征聚合的算法改进特征选择的有效性。实验表明与Bayes、KNN和SVM这三种性能优越的算法相比,基于最大熵的文本分类算法可取得较之更优的分类精度。
The Internet has become the main source for people to get various information. Text classification has become the key technology in document data organization and processing. Maximum Entropy Model, a probability estimation technique widely used for a variety of natural language tasks, is used for text classification. A feature aggregation algorithm is used to select efficient feature. The experimental results show that compared with Bayes, KNN and SVM, the proposed text classification algorithm achieves better performance.
出处
《计算机应用与软件》
CSCD
北大核心
2008年第3期263-264,277,共3页
Computer Applications and Software
关键词
文本分类
最大熵模型
特征选取
Text classification Maximum entropy model Feature selection