期刊文献+

基于关联规则的中文文本分类算法的改进 被引量:6

Improvement of Chinese Text Categorization Based on Associate Rules
下载PDF
导出
摘要 随着中文电子刊物和Web文档数量的飞速增加,中文文本自动分类工作变得日益重要.将文档视为事务,将关键词视为项,文本预处理时提出特征权重阈值,用构造的分类器对未知文档分类时,采用了CDD(Class Differen-tiate Degree)改进算法,对基于关联规则挖掘的中文文本自动分类方法进行了改进.实验结果表明,该算法能较快地获得可理解的规则并且具有较好的宏平均和微平均值. With the rapid expansion of Chinese electronic publication and web documents, the work of automatic Chinese text categorization is important increasingly. A new method called improved automatic Chinese text categorization based on associate ruels mining is proposed in the algorithm. Each documnet and keyword is represented as transaction and item. Character threshold is introduced in the text being preprocessed. CDD(Class Differentiate Degree) improved algorithm is used when using the classifier to classify the unknown documents. Experiments confirm that this algorithm gets the understandable rules of classifer faster and better in terms of the average promising recall and precision rate.
出处 《郑州大学学报(理学版)》 CAS 2007年第2期114-117,共4页 Journal of Zhengzhou University:Natural Science Edition
基金 重庆市科委自然科学基金资助项目 编号CSTC2006BB2021
关键词 关联规则挖掘 中文文本 文本自动分类算法 associate rules mining Chinese documents text automatic classified algorithm
  • 相关文献

参考文献5

二级参考文献34

  • 1苏毅娟,严小卫.一种改进的频繁集挖掘方法[J].广西师范大学学报(自然科学版),2001,19(3):22-26. 被引量:10
  • 2吴军,王作英,禹锋,王侠.汉语语料的自动分类[J].中文信息学报,1995,9(4):25-32. 被引量:24
  • 3卜东波.聚类/分类理论研究及其在文本挖掘中的应用.中科院计算所博士学位论文[M].-,2000.. 被引量:1
  • 4YANG BINGRU. KD(D&K) and Double - Bases Cooperating Mechanism[J]. Journal of System Engineering and Electronics. 1999,10(1) :56 - 64. 被引量:1
  • 5C.H. PAPAADIMITRIOU,etal. Latent Semantic Indexing: A Probabilistic Analysis[A]. In Proceedings of PODS'98[C], Seattle, W A. 1998,159-168. 被引量:1
  • 6Agrawal R, Srikant R. Fast algorithm for mining association rules in large databases [C]. In: Research Report RJ9839.IBM Almaden Research Center. San Jose. Ca, June 1994: 1-32. 被引量:1
  • 7Liu Bing. Integrating classification and association rule mining[J]. KDD-98, 1998. 被引量:1
  • 8Li Wen-rain, Han Jia-wei,Pei Jian. CMAR: Accurate and efficient classification based on multiple class-association rules[C]. ICDM2001:369-376. 被引量:1
  • 9Osmar R Zaiane, Maria-Luiza Antonie. Classifying text document by association terms with text categories [C]. The Thirteenth Australssian Database Conference (ADC2002), Melbourne, Australia : 215-222. 被引量:1
  • 10Agrawal R,Imielinski T,Swami A. Mining associations between sets of items in large databases[A]. Proceeding of the 1993 ACM-SIGMOD international conference on management of data[C]. Washington:Springer-Verlag,, 1993.207-216. 被引量:1

共引文献76

同被引文献56

引证文献6

二级引证文献42

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部