摘要
【目的】专利关键词标引是中文信息处理领域较为基础的环节,在专利检索、专利翻译以及专利自动摘要中具有较高的应用价值。【方法】采用K-最邻近耦合图将专利文献映射成复杂网络图模型,结合平均路径变化量、平均聚类系数变化量以及当前节点对整个复杂图模型流动性的影响,提出平均连通权重评价指标。分析关键词位置信息、关键词跨度信息以及关键词逆文档频率信息,提出专利综合相关特征衡量关键词的重要性。【结果】在传感器领域专利文献的实验结果中,Top-8级别上准确率达到60.9%,Top-10级别上召回率达到73.4%。【局限】对低频关键词的处理效果不够理想,影响了标引效果。【结论】实验结果表明该方法的有效性,对专利标引具有积极意义。
[Objective] Patent keyword indexing plays an important role in nature language processing and is widely applied in many fields, such as patent retrieval, translation and automatic summary. [Methods] Using K-proximity coupled graph to transfer patents into complex graph model, and average connectivity weight is proposed with the average path variation, the average clustering coefficient, and the current node's liquidity effect. Considering the location information, the word-gap information and the inverse document frequency of keywords, a patent comprehensive correlation calculation method for quantitative analysis of keyword importance is proposed. [Results] Experiment of patent literatures in sensor domain obtains the precision of 60.9% on top-8, and the recall rate of 73.4% on top-10. [Limitations] The result of keywords with low frequency is not good enough, which affects the indexing result. [Conclusions] Experimental results show that this method is effective and has active significance for patent indexing.
出处
《现代图书情报技术》
CSSCI
2015年第3期26-32,共7页
New Technology of Library and Information Service
基金
国家自然科学基金项目"基于本体的专利自动标引研究"(项目编号:61271304)
北京市教委科技发展计划重点项目暨北京市自然科学基金B类重点项目"面向领域的互联网多模态信息精准搜索方法研究"(项目编号:KZ201311232037)
北京市属高等学校创新团队建设与教师职业发展计划项目"大数据内容理解的理论基础及智能化处理技术"(项目编号:IDHT20130519)的研究成果之一
关键词
复杂图模型
拓扑势
关键词标引
平均连通权重
综合相关特征
Complex graph model
Topology potential
Keyword indexing
Average connectivity weight
Comprehensive correlation