期刊文献+

基于词语量化关系的主题概念抽取算法研究 被引量:2

Subject Concepts Extraction Based on Conceptual Quantified Relations
下载PDF
导出
摘要 随着网络资源的快速膨胀,海量的文本自动处理任务面临着巨大的挑战,而文本主题抽取就是文本自动处理领域中的一项重要研究课题。针对词语量化关系的主题概念抽取算法,首先在词聚类的基础上建立概念向量空间模型,由知网中词语相似度,加权计算出概念权重;然后利用词典中词语之间量化关系,通过对概念的相关向量和权重的向量乘积得到每个概念的主题重要度;最后依据重要度抽取出反映文本主题的概念来。实验证明,上述与传统的词频统计相比,准确率更高。 With the quick expanding of the Internet information resource, the task of automatically processing a mass of texts is faced with a huge challenge. Subject concept extraction is an important issue in text information automatic processing. This paper presents a novel algorithm about choosing subject concepts based on conceptual quantified relations . At first, by word clustering, the paper establishes a conceptual vector space model. Based on this modal, weights of concepts can be carried out in terms of conceptual semantic similarity from hownet. Then according to conceptual quantified relations in Chinese dictionary, subject importance of concepts can be got by computing dot product of weight vectors and related vectors. At last, the subject concepts are extracted by importance. The experimental results indicate that this algorithm has higher precision than normal statistical methods.
出处 《计算机仿真》 CSCD 北大核心 2009年第12期122-125,共4页 Computer Simulation
基金 国家自然科学基金(60496326)
关键词 主题概念 词语量化关系 概念向量模型 Subject concept Conceptual quantified relation Conceptual vector space modal
  • 相关文献

参考文献12

  • 1马颖华,王永成,苏贵洋,张宇萌.一种基于字同现频率的汉语文本主题抽取方法[J].计算机研究与发展,2003,40(6):874-878. 被引量:48
  • 2韩客松.中文文本主题自动提取和标引若干关键技术研究[D].上海交通大学,2001. 被引量:1
  • 3李行健主编..现代汉语规范词典[M].北京:外语教学与研究出版社;北京,2004:1758.
  • 4刘群,李素建.基于《知网》的词汇语义相似度的计算[C].台北:第三届汉语词汇语义学研讨会,2002. 被引量:45
  • 5李莼,罗振声,厉宇航.基于语义相关和概念相关的自动分类方法研究[J].计算机工程与应用,2003,39(12):106-109. 被引量:5
  • 6刘功申,王永成,鲍峥嵘,沈洲.基于概念粘合度(CC)的多主题分析[J].情报学报,2002,21(1):2-6. 被引量:3
  • 7K K Bun, M Ish Izuka. Top ic extraction from news archives using TF * RDF algrithm [ C ]. The Third International Conference on Web Information Systems Engineering, Singapore, 2002. 73 -82. 被引量:1
  • 8Hideki Kozima. Similarity between Words--Computed by Spreading Activation on an English Dictionary [ D ]. Doctoral Thesis. , December 13 ,1993. 被引量:1
  • 9K Lagus, S Kaski. Keyword selection method for characterizing text document maps [ C ]. In Proceedings of ICANN ' 99,1999,1 : 317 - 376. 被引量:1
  • 10Jaime Carbonell and Jade Goldstein. The use of MMR, diversity based re ranking for reordering documents and producing summarization[C]. In :Proceedings of SIGIN298 , Melbourne , Australia, August 1998. 被引量:1

二级参考文献7

共引文献97

同被引文献17

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部