期刊文献+

基于规则重构的关联文本分类 被引量:2

Associative text classification based on restructured rule
下载PDF
导出
摘要 研究了现有的关联分类算法在文本分类中的应用,发现对于有结构的文本数据,关联分类算法未考虑文本的语义信息导致分类精度不够理想,为此提出了基于规则重构的关联文本分类方法。该方法利用词共现模型,在已挖掘的分类规则基础上,将具有高共现程度的词对组合在一起进行规则重构,形成了有结构的带有文本语义信息的分类规则,再利用它们对新文本进行分类。实验结果表明,该方法在分类精度上优于其它的关联文本分类方法(ARC)。 After studying the application of association-classification method in text classification, we discovered that the associationclassification' s classification accuracy isn' t reasonable enough in structured text data because of ignoring semantic information of the text. Therefore, a new associative text classification algorithm based on restructured rule (RARC) is presented. It took advantage of the term co-occurrence model to combine the term pairs with high s degree together to restructure the mined rules into structured classification rules with semantic information of text data, and then using them to build classifier. The result of experiments show that RARC had better classification accuracy than other associative classification method (ARC).
出处 《计算机工程与设计》 CSCD 北大核心 2009年第3期624-626,630,共4页 Computer Engineering and Design
基金 国家火炬计划基金项目(2004EB33006) 江苏省高校自然科学指导性计划基金项目(05JKD520050)
关键词 词共现 向量空间模型 规则重构 关联分类 文本分类 term co-occurrence vector space model rule restructured associative classification text classification
  • 相关文献

参考文献10

二级参考文献21

  • 1Yiming Yang, An evaluation of statistical approaches to text categorization[J]. In:Journal of Information Retrieval,1999,1(2) :67 - 88. 被引量:1
  • 2Jian-yun Nie, Jianfeng Gao etc. On the Use of Words and N-grams for Chinese Information Retrieval[A]. Fifth International Workshop on Information Retrieval with Asian Languages [ C ]. Hong Kong, September 30 - October 1,2000. 被引量:1
  • 3Ch. Cherif Latiri, BenYahia S. Generating implicit association rules from textual data[C]. Proceedings of the ACS/IEEE International Conference on Computer Systems and Applications,2001. 137-143. 被引量:1
  • 4Liu B, Hsu W, Ma Y. Integrating classification and Association rule mining [C]. Proc of the int confon knowledge discovery and data mining[C]. New York:AAAI Press, 1998.80-86. 被引量:1
  • 5Li Wen-min, Han Jia-wei, Pei Jian. CMAR:Aaccurate and efficient classification based on multiple class association rules [C].California:Morgan Kaufmann, 2001. 369-376. 被引量:1
  • 6John D Holt, Soon M Chung. Mining association rules in text databased using multipass with inverted hashing and pruning[C].USA:IEEE Computer Society, 2002. 49-56. 被引量:1
  • 7Han Jia-wei, Micheline Kambr. Data mining:concepts and techniques [M].California:Morgan Kaufmann Publishers, 2000. 被引量:1
  • 8Baralis E, Garza P. A lazy approach to pruning classification rules[C]. USA:IEEE Computer Society, 2002.35-42. 被引量:1
  • 9Agrawal R, Srikant R. Fast algorithms for mining association rules [C]. California:Morgan Kaufmann, 1994.487-499. 被引量:1
  • 10Feng Jianlin, He Yu, Zou Jing. Moderately Extending Core Words for Texl Classification. Submitted to SIGKDD06 被引量:1

共引文献71

同被引文献17

  • 1刘金红,陆余良,周新栋.一种辅以强规则学习的双层文本分类模型[J].计算机工程,2007,33(8):165-167. 被引量:3
  • 2夏天.汉语词语语义相似度计算研究[J].计算机工程,2007,33(6):191-194. 被引量:63
  • 3Sparck Jones K. IDF term weighting and IR research lessons[ J ]. Journal of Documentation, 2004,60 ( 6 ) : 521-523. 被引量:1
  • 4Deerwester S, Dumais S T, Landauer T K, et al. Indexing by latent semantic analysis [ J ]. Journal of American Society of Information Science,1990, 41 (6) : 391-407. 被引量:1
  • 5Ananiadou S, Nenadic G. Automatic Terminology Management in Biomedicine [ M ]. Text Mining for Biology and Biomedicine, UK :Aaech Hous, 2006:67-98. 被引量:1
  • 6G6mez-Verdejo V, Marttnez-Ram6n M, Florensa-Vila J. Analysis of fMRI time series with mutual information[ J ]. Medical Image Analysis ,2012, 16 (2) : 451-458. 被引量:1
  • 7Lauriston A. Automatic Term Recognition: performance of Linguistic and Statistical Techniques [ D ]. Manchester : University of Manchester, 1996. 被引量:1
  • 8Frantzi K, Ananiadou S, Mima H. Automatic Recognition of Multi-Word Terms the C-value/NC-value MethodEJ. International Journal on Digital Libraries, 2000, 3: t15-130. 被引量:1
  • 9Perez A, Tortes M I, Casacuberta F. Joining linguistic and statistical methods for spanish-to-basque speech translation [ J ]. Journal of Speech Communication. 2008, 50(11-12) : 1021-1033. 被引量:1
  • 10刘群,李素建.基于《知网》的词汇语义相似度计算[z].台北:第三届汉语词汇语义学研讨会,2002:1-18. 被引量:1

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部