期刊文献+

基于动词依存集的领域概念聚类方法 被引量:2

A domain concept clustering method based on the verb dependency set
下载PDF
导出
摘要 为了能在小规模特定领域语料库上进行有效的概念聚类,提出了一种基于动词依存集的领域概念聚类方法。根据同类领域概念与特定的领域动词共现这一特征,在领域专家的辅助下制定动词依存集,通过计算在主谓结构和动宾结构中与动词依存集共现的概念动词依存度,将依存度高于阈值的概念聚为一类。实验证明,该方法在小规模特定领域语料库上较为实用,聚类结果的概念重合率优于基于LSI和基于搜索引擎的概念聚类方法。 In order to process the small-scale domain corpus,a domain concept clustering method based on the verb dependency set was proposed. According to the feature that the same cluster of domain concept appears together with specific domain verbs,the verb dependency set was developed with the assistance of domain experts. Next,the verb dependency value of concept which appeared together with verb dependency set in subject-predicate and verbobject structure was calculated,then the concepts that had higher dependency value than threshold were clustered.Experimental results showed that this method gets higher concept coincide than the LSI-based and the search engine-based concept clustering method,and it just adapts to processing the small-scale domain corpus.
作者 刘里 肖迎元
出处 《哈尔滨工程大学学报》 EI CAS CSCD 北大核心 2015年第7期949-953,共5页 Journal of Harbin Engineering University
基金 国家自然科学基金资助项目(61202169 61301140) 天津市"131"创新型人才培养工程
关键词 聚类方法 语料库 动词依存集 依存句法分析 领域概念 概念重合率 clustering method corpus verb dependency set dependency parser domain concept concept coincide
  • 相关文献

参考文献17

  • 1TUFI? D, ION R, IDE N. Fine-grained word sense disambiguation based on parallel corpora, word alignment, word clustering and aligned wordnets[C]//Proceedings of the 20th International Conference on Computational Linguistics. Association for Computational Linguistics. Geneva, Switzerland, 2004: 1312. 被引量:1
  • 2JIN P, SUN X, WU Y, et al. Word clustering for collocation-based word sense disambiguation [C]//Computational Linguistics and Intelligent Text Processing. Berlin: Springer, 2007: 267-274. 被引量:1
  • 3陈炯,张永奎.一种基于词聚类的中文文本主题抽取方法[J].计算机应用,2005,25(4):754-756. 被引量:17
  • 4CHEN W L, CHANG X Z, WANG H Z, et al. Automatic word clustering for text categorization using global information[C]//Information Retrieval Technology. Berlin: Springer, 2005: 1-11. 被引量:1
  • 5DHILLON I S, MALLELA S, KUMAR R. Enhanced word clustering for hierarchical text classification[C]//Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM. Edmonton, Canada, 2002: 191-200. 被引量:1
  • 6MOMTAZI S, KLAKOW D. A word clustering approach for language model-based sentence retrieval in question answering systems[C]//Proceedings of the 18th ACM Conference on Information and Knowledge Management. ACM. Hong Kong, China, 2009: 1911-1914. 被引量:1
  • 7郭怀恩,朱礼军,徐硕.词聚类技术研究综述[J].数字图书馆论坛,2010(5):15-19. 被引量:2
  • 8闻扬,苑春法,黄昌宁.基于搭配对的汉语形容词-名词聚类[J].中文信息学报,2000,14(6):45-50. 被引量:11
  • 9WANG B, WANG H. A comparative study on Chinese word clustering[C]//Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead. Berlin: Springer, 2006: 157-164. 被引量:1
  • 10FARHAT A, ISABELLE J F, O’SHAUGHNESSY D. Clustering words for statistical language models based on contextual word similarity[C]// 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing. Atlanta, Georgia, 1996: 180-183. 被引量:1

二级参考文献57

  • 1陈炯,张永奎.一种基于词聚类的中文文本主题抽取方法[J].计算机应用,2005,25(4):754-756. 被引量:17
  • 2李峰,李芳.中文词语语义相似度计算——基于《知网》2000[J].中文信息学报,2007,21(3):99-105. 被引量:106
  • 3许伟.句法-语义一体化的汉语句法分析研究[硕士学位论文].北京:清华大学,1997.. 被引量:1
  • 4边肇祺.模式识别[M].北京:清华大学出版社,1997.. 被引量:3
  • 5JAMES A T, JUSTIN Z. A Model for Word Clustering[J]. Journal of the American Society for Information Science and Technology, 1992. 被引量:1
  • 6PETER F B, VINCENT J D P, PETER V D, JENIFER C L,ROBERT L M. Class-Based n-gram Models of Natural Language[J]. Computational Linguistics, 1992. 被引量:1
  • 7SHINSUKE M, MAKOTO N. A Stochastic language model using dependency and its improvement by word clustering[C]// Universite de Montreal, Government of Canada. Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics. Morristown, N J, USA: Association for Computational Linguistics, 1998: 898-904. 被引量:1
  • 8JOHN G M, FRANCIS J S. Improving Statistical Language Model Performance with Automitically Generated Word Hierarchies[J]. Computational Linguistics, 1996,22(2):217-247. 被引量:1
  • 9BAssiou N K, KOTROPOULOS C L. Interpolated distanced bigram language models for robust word clustering[C]//Nonlinear Signal and Image Processing.[出版者不详],2005. 被引量:1
  • 10SHINSUKE M, NISHIMURA M, NOBUYASU I. Language Model Adaptation using Word Clustering[J]. Joho Shori Gakkai Kenkyu Hokoku, 2003,2003(14):89-94. 被引量:1

共引文献26

同被引文献28

引证文献2

二级引证文献20

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部