期刊文献+

基于依存句法网络的文本特征提取研究 被引量:10

Research of Text Feature Extraction on Dependency Parsing Network
原文传递
导出
摘要 【目的】利用依存句法分析构建更准确的文本网络,提高基于网络图的文本特征提取方法的准确率。【方法】根据依存句法分析的结果确定特征词之间的语义关联,利用特征词依存方向确定其关联方向,采用改进的Page Rank算法计算节点重要性,并以此为指标进行特征提取。【结果】实验结果表明,相较共词网络,基于依存句法网络的特征提取方法能在一定程度上提高文本聚类的效果。【局限】利用依存关系确定特征词关联方向时没有对不同的依存类型进行区分。【结论】提出的基于依存句法网络的文本特征提取方法是有效的。 [Objective] In order to promote the accuracy of text feature extraction method based on network, this paper builds a more accurate text network by dependency parsing. [Methods] This method determines the semantic association between feature words according to the result of dependency parsing and the direction of the edges by dependent direction of feature words. And then the improved PageRank algorithm is used to calculate the network node importance to complete the feature extraction. [Results] Experimental results show that to some extent, text feature extraction based on dependency parsing network can improve the effect of document clustering, compared to co-word network. [Limitations] This paper does not distinguish different dependent type when determines the direction between feature words by dependent relationship. [Conclusions] The proposed method based on dependency parsing network is effective on the text feature extraction.
作者 唐晓波 肖璐
出处 《现代图书情报技术》 CSSCI 北大核心 2014年第11期31-37,共7页 New Technology of Library and Information Service
基金 国家自然科学基金项目"社会化媒体集成检索与语义分析方法研究"(项目编号:71273194)的研究成果之一
关键词 特征提取 依存句法分析 复杂网络 Feature extraction Dependency parsing Complex network
  • 相关文献

参考文献31

  • 1赵鹏,蔡庆生,王清毅,耿焕同.一种基于复杂网络特征的中文文档关键词抽取算法[J].模式识别与人工智能,2007,20(6):827-831. 被引量:44
  • 2Dumais S, Platt J,Heckerman D, et al. Inductive LearningAlgorithms and Representations for Text Categorization [C].In: Proceedings of the 7th International Conference onInformation and Knowledge Management (CIKM,98). NewYork: ACM, 1998: 148-155. 被引量:1
  • 3Apte C, Damerau F, Weiss S M. Automated Learning ofDecision Rules for Text Categorization [J]. ACMTransactions on Information Systems, 1994, 12(3): 233-251. 被引量:1
  • 4Joachims T. A Probabilistic Analysis of the RocchioAlgorithm with TFIDF for Text Categorization [C]. In:Proceedings of the 14th International Conference on MachineLearning (ICML,97). San Francisco: Morgan KaufmannPublishers Inc., 1997: 143-151. 被引量:1
  • 5Yang Y, Pedersen J O. A Comparative Study on FeatureSelection in Text Categorization [C]_ In: Proceedings of the14th International Conference on Machine Learning(ICML,97). San Francisco: Morgan Kaufmann PublishersInc., 1997: 412-420. 被引量:1
  • 6Church K W, Hanks P. Word Association Norms, MutualInformation, and Lexicography [J], Computational Linguistics,1990, 16(1): 22-29. 被引量:1
  • 7Quinlan J R. Induction of Decision Trees [J]. MachineLearning, 1986, 1(1): 81-106. 被引量:1
  • 8Mesleh A M A. Chi Square Feature Extraction Based SVMsArabic Language Text Categorization System [J]. Journal ofComputer Science, 2007, 3(6): 430-435. 被引量:1
  • 9张玉芳,万斌候,熊忠阳.文本分类中的特征降维方法研究[J].计算机应用研究,2012,29(7):2541-2543. 被引量:36
  • 10邹加棋,陈国龙,郭文忠.基于图模型的中文文档分类研究[J].小型微型计算机系统,2006,27(4):754-757. 被引量:3

二级参考文献147

共引文献350

同被引文献103

引证文献10

二级引证文献44

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部