期刊文献+

领域术语自动抽取及其在文本分类中的应用 被引量:31

Automatic Domain-Specific Term Extraction and Its Application in Text Classification
下载PDF
导出
摘要 本文提出了一种基于信息熵的领域术语抽取方法,在给定领域分类语料的前提下,该方法既考虑了领域术语在不同领域类别间分布的不均匀性,又考虑了其在特定领域类别内分布的均匀性,并针对语料的不平衡性进行了正规化.人工评测显示该方法能更准确有效地抽取领域术语.本文还将该算法应用于文本分类,用于代替传统特征选择算法,实验表明,该算法能够显著提高文本分类的精度. A statistical method based on information entropy is proposed for domain-specific term extraction from domain comparative corpora. It takes into account the distribution of a candidate word among domains and within a certain domain. Normalization step is added into the extraction process to cope with unbalanced corpora. The proposed method characterizes attributes of domain-specific term more precisely and more effectively than previous term extraction approaches.Domain-specific terms are applied in text classification as the feature space.Experimental results indicate that it achieves better performance than traditional feature selection methods.
出处 《电子学报》 EI CAS CSCD 北大核心 2007年第2期328-332,共5页 Acta Electronica Sinica
基金 国家自然科学基金(No.60673037)
关键词 领域术语 信息熵 正规化 文本分类 特征选择 domain-specific term information entropy normalization text classification feature selection
  • 相关文献

参考文献13

  • 1Boguraev B, Kennedy C. Applications of term identification technology: domain description and content characterisafion [ J] .Natural Language Engineering, 1999,5( 1 ) : 17 - 44. 被引量:1
  • 2Velardi P,Missikoff M,et al. Identification of relevant terms to support the construction of domain ontologies[ A]. Proceedings of the Workshop on Human language Technologies and Knowledge Management[ C ]. France. ACM Press, 2001.1 - 8. 被引量:1
  • 3Maedche A, Staab S. Ontology learning. Handbook on Ontologies in Information Systems[M ]. Heidelberg. Springer-Verlag,2004.173 - 190. 被引量:1
  • 4Oakes M P,Paice C.Term extraction for automatic abstracting.Recent Advances in Computational Terminology[ M]. Amsterdam/Philadelphia: John Benjamins Publishing Company, 2001.353 - 370. 被引量:1
  • 5Gao J, Goodman J, et al. The use of clustering techniques for language modeling-application to Asian language[J]. Computafional Linguistics and Chinese Language Processing, 2001, 6(1):27- 60. 被引量:1
  • 6Avancini H, Lavelli A, et al. Expanding domain-specific lexicons by term categorization [ A ]. Proceedings of 18th ACM Symposium on Applied Computing[ C]. US: ACM Press, 2003.793 - 797. 被引量:1
  • 7陈文亮,朱靖波,等.基于Bootswapping的领域词汇自动获取[A].全国第七届计算语言学联合学术会议论文集[C].北京:清华大学出版社,2003,67—72. 被引量:1
  • 8Xu F, Kurz D, et al. A domain adaptive approach to automatic acquisition of domain relevant terms and their relations with bootstrapping[ A]. Proceedings of the 3rd International Conference on Language Resources and Evaluation[C]. Spain: LREC press, 2002.224 - 230. 被引量:1
  • 9Liu T,Wang X L, et al. Domain-specific term extraction and its application in text classification [ A]. Proceedings of 8th Joint Conference on Information Sciences[ C]. USA: World Scientific Press,2005. 1481 - 1484. 被引量:1
  • 10Wang Q,Wang X L,et al.A study of semi-discrete matrix decomposition for LSI in automated text categorization[ A]. Pro-ceeding of 1st International Joint Conference on Natural Language Processing[ C]. China: Springer-Verlag, 2004. 606 - 615. 被引量:1

同被引文献381

引证文献31

二级引证文献217

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部