期刊文献+

基于语法树的SAO结构识别方法研究 被引量:5

Parse Tree-based SAO Structure Identification
原文传递
导出
摘要 [目的/意义]SAO是一种能够表示主题信息和主题间关系的3元组结构,是文献计量学领域一个快速发展的研究方向。为了获得"满足文献计量分析需求的SAO结构",需要解决现有SAO结构识别方法遭遇的3个问题:查全和查准率低、所识别SAO结构和领域主题相关性不强以及矩阵稀疏性。[方法/过程]提出一种面向文献计量分析的基于语法树的SAO结构识别方法,首先基于共现算法和"主题词簇"方法(term clumping)识别SAO核心组件,然后利用基于语法树的抽取算法实现SAO结构的逐层抽取。[结果/结论]案例研究发现,该方法的平均查准率为0.805 8,平均查全率为0.844 6,所识别SAO结构与领域主题关系较强,且矩阵稀疏性也得到较好改善,可有效应用于相关文献计量分析。 [Purpose/significance]Subject-Action-Object (SAO) is a triple structure which can be used to both de- scribe topics in details and explore the relationship between topics. SAO analysis is a fast-growing research field. In order to obtain the SAO structures which are suitable for the bibliometric analysis, three problems need to be solved. Recall and precision have been low. The SAOs don' t have close relationships with domain topics. There is a problem of matrix spar- sity. [ Method/process] This paper proposed a parse tree-based SAO identification method for the bibliometric analysis. It included : ( 1 ) a model to identify the core components of SAO structures, where co-word analysis and term clumping processes were involved; (2) a parse tree-based hierarchical SAO extraction model to implement SAO structures identifi- cation. [ Result/concluslon] The case study shows that the average precision and average recall of the proposed method is 0. 8058 and 0. 8446. The SAO extracted with our method has a great relationship with the domain topic and improves the matrix sparsity, which makes it be used as an effective tool for the bibliometric analysis.
出处 《图书情报工作》 CSSCI 北大核心 2016年第21期113-121,共9页 Library and Information Service
基金 国家自然科学基金面上项目"基于语义TRIZ的新兴技术创新路径预测研究"(项目编号:71373019) 国家高技术研究发展计划"面向政府管理的大数据智能服务系统及应用示范"(项目编号:2014AA015105)研究成果之一
关键词 “主语-行为-宾语”(SAO)识别 语法树 语义分析 共现算法 主题词簇 subject-action-object (SAO) identification parse tree semantic analysis co-word algorithm termclumping
  • 相关文献

参考文献16

二级参考文献392

共引文献220

同被引文献68

引证文献5

二级引证文献48

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部