摘要
【目的】利用依存句法分析构建更准确的文本网络,提高基于网络图的文本特征提取方法的准确率。【方法】根据依存句法分析的结果确定特征词之间的语义关联,利用特征词依存方向确定其关联方向,采用改进的Page Rank算法计算节点重要性,并以此为指标进行特征提取。【结果】实验结果表明,相较共词网络,基于依存句法网络的特征提取方法能在一定程度上提高文本聚类的效果。【局限】利用依存关系确定特征词关联方向时没有对不同的依存类型进行区分。【结论】提出的基于依存句法网络的文本特征提取方法是有效的。
[Objective] In order to promote the accuracy of text feature extraction method based on network, this paper builds a more accurate text network by dependency parsing. [Methods] This method determines the semantic association between feature words according to the result of dependency parsing and the direction of the edges by dependent direction of feature words. And then the improved PageRank algorithm is used to calculate the network node importance to complete the feature extraction. [Results] Experimental results show that to some extent, text feature extraction based on dependency parsing network can improve the effect of document clustering, compared to co-word network. [Limitations] This paper does not distinguish different dependent type when determines the direction between feature words by dependent relationship. [Conclusions] The proposed method based on dependency parsing network is effective on the text feature extraction.
出处
《现代图书情报技术》
CSSCI
北大核心
2014年第11期31-37,共7页
New Technology of Library and Information Service
基金
国家自然科学基金项目"社会化媒体集成检索与语义分析方法研究"(项目编号:71273194)的研究成果之一
关键词
特征提取
依存句法分析
复杂网络
Feature extraction Dependency parsing Complex network