摘要
随着网络资源的快速膨胀,海量的文本自动处理任务面临着巨大的挑战,而文本主题抽取就是文本自动处理领域中的一项重要研究课题。针对词语量化关系的主题概念抽取算法,首先在词聚类的基础上建立概念向量空间模型,由知网中词语相似度,加权计算出概念权重;然后利用词典中词语之间量化关系,通过对概念的相关向量和权重的向量乘积得到每个概念的主题重要度;最后依据重要度抽取出反映文本主题的概念来。实验证明,上述与传统的词频统计相比,准确率更高。
With the quick expanding of the Internet information resource, the task of automatically processing a mass of texts is faced with a huge challenge. Subject concept extraction is an important issue in text information automatic processing. This paper presents a novel algorithm about choosing subject concepts based on conceptual quantified relations . At first, by word clustering, the paper establishes a conceptual vector space model. Based on this modal, weights of concepts can be carried out in terms of conceptual semantic similarity from hownet. Then according to conceptual quantified relations in Chinese dictionary, subject importance of concepts can be got by computing dot product of weight vectors and related vectors. At last, the subject concepts are extracted by importance. The experimental results indicate that this algorithm has higher precision than normal statistical methods.
出处
《计算机仿真》
CSCD
北大核心
2009年第12期122-125,共4页
Computer Simulation
基金
国家自然科学基金(60496326)
关键词
主题概念
词语量化关系
概念向量模型
Subject concept
Conceptual quantified relation
Conceptual vector space modal