摘要
【目的】为提高科技文献信息的组织和检索效率,从解决科技文献术语抽取这一基础研究问题入手,提出一种基于科技文献术语特点和统计计算相结合的科技文献术语自动抽取方法。【方法】核心技术是结合科技文献术语的语言特点,以及术语在文献中的词语组合强度和出现位置等统计计算信息,构建科技文献术语自动抽取算法。【结果】实验测试结果表明,获取的科技文献术语词语的平均准确率可以达到51.2%。【局限】在统计计算算法和数据处理方面,还需进一步改进算法和提高数据质量。【结论】提出的基于科技文献术语特点和统计计算相结合的科技文献术语自动抽取方法是有效的。
[Objective] In order to improve the efficiency of science and technology literature information organization and retrieval, extraction of science and technology terms is the basic research problem. [Methods] The paper proposes an automatic extraction method based on science and technology terms characteristics and statistical computing. The method fully combines language characteristics and statistical information of terms such as the combination strength between words and the position that appeared in the literature to realize automatic extraction algorithm. [Results] Experimental results show that the average accuracy of scientific terms extraction can reach 51.2%. [Limitations] Statistical computing algorithm and data processing still need further improve for the algorithm and the quality of data. [Conclusions] The proposed method is effective.
出处
《现代图书情报技术》
CSSCI
北大核心
2014年第1期51-55,共5页
New Technology of Library and Information Service
基金
"十二五"国家科技支撑计划课题"基于多源信息的电动汽车数据挖掘关键技术研究"(项目编号:2013BAG06B01)
国家自然科学基金项目"支持面向特定情报分析应用的知识组织系统快速构建关键问题研究"(项目编号:71203208)的研究成果之一
关键词
科技术语术语特点统计计算
自动抽取
Technical term Term characteristic Statistical calculation Automatic extraction