期刊文献+

多类型分类器融合的文本分类方法研究 被引量:3

Research on text classification method of multi-class classifier fusion
下载PDF
导出
摘要 传统的文本分类方法大多数使用单一的分类器,而不同的分类器对分类任务的侧重点不同,就使得单一分类方法有一定的局限性,同时每个特征提取方法对特征词的考虑角度不同。针对以上问题,提出了多类型分类器融合的文本分类方法。该模型使用了word2vec、主成分分析、潜在语义索引以及TFIDF特征提取方法作为多类型分类器融合的特征提取方法。在多类型分类器加权投票方法中忽略了类别信息的问题,提出了类别加权的分类器权重计算方法。通过实验结果表明,多类型分类器融合方法在二元语料库、多元语料库以及特定语料库上都取得了很好的性能,类别加权的分类器权重计算方法比多类型分类器融合方法在分类性能方面提高了1. 19%。 Most of the traditional text classification methods use a single classifier,and different classifiers have different emphasis on classification tasks,which makes the single classification method have some limitations.At the same time,each feature extraction method has different angles of considering the feature words.Aiming at the above problems,this paper proposed a text classification method based on multi type classifier fusion,which combined word2vec,principal component analysis,latent semantic indexing and TFIDF feature extraction as feature extraction methods for the multi-type classifier fusion.The weighted voting method of multi-type classifier ignores the category information.This paper proposed a weighted classifier weight calculation method.The experimental results show that the multi classifier fusion method has achieved good performance both in two dimensional,multiple corpora and corpus specific corpus,the classification weighting method of classifier weighting improves the classification performance by 1.19%compared with the multi-type classifier fusion method.
作者 李惠富 陆光 Li Huifu;Lu Guang(College of Information&Computer Engineering,Northeast Forestry University,Harbin 150040,China)
出处 《计算机应用研究》 CSCD 北大核心 2019年第3期752-755,共4页 Application Research of Computers
基金 黑龙江省自然科学基金资助项目(F201201)
关键词 文本分类 分类器融合 主成分分析 潜在语义索引 text classification classifier fusion principal component analysis potential semantic index
  • 相关文献

参考文献7

二级参考文献81

  • 1翟东海,王佳君,聂洪玉,崔静静.基于互信息的热点词发现和突发性话题检测研究[J].西藏大学学报(社会科学版),2013,28(4):82-87. 被引量:2
  • 2贾自艳,何清,张海俊,李嘉佑,史忠植.一种基于动态进化模型的事件探测和追踪算法[J].计算机研究与发展,2004,41(7):1273-1280. 被引量:58
  • 3朱嫣岚,闵锦,周雅倩,黄萱菁,吴立德.基于HowNet的词汇语义倾向计算[J].中文信息学报,2006,20(1):14-20. 被引量:327
  • 4D. D. Lewis. Naive (Bayes) at forty: The independence assumption in information retrieval. In: Proc. of the 10th European Conf. on Machine Learning. New York: Springer,1998, 4-15. 被引量:1
  • 5Y. Yang, X. Lin. A re-examination of text categorization methods. In: The 22nd Annual Int'l ACM SIGIR Conf. onResearch and Development in the Information Retrieval. NewYork: ACM Press, 1999. 被引量:1
  • 6Y. Yang, C. G. Chute. An example based mapping method for text categorization and retrieval. ACM Trans. on Information Systems, 1994, 12(3): 252 -277. 被引量:1
  • 7E. Wiener. A neural network approach to topic spotting. The 4th Annual Syrup. on Document Analysis and Information Retrieval,Las Vegas, NV, 1995. 被引量:1
  • 8R. E. Schapire, Y. Singer. Improved boosting algorithms using confidence-rated predications. In: Proc. of the 11th Annual Conf.on Computational Learning Theory. New York: ACM Press,1998. 80--91. 被引量:1
  • 9T. Joachims. Text categorization with support vector machines:Learning with many relevant features. In: Proc. of the 10th European Conf. on Machine Learning. New York: Springer,1998. 137-142. 被引量:1
  • 10Y. Yang. An evaluation of statistical approaches to text categorization. Information Retrieval, 1999, 1 ( 1 ) : 76-- 88. 被引量:1

共引文献168

同被引文献38

引证文献3

二级引证文献67

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部