摘要
[目的/意义]从用户角度出发,研究基于用户自然标注的TF-IDF辅助标引算法。[方法/过程]首先以核心期刊论文中作者标注的关键词和分类号为源数据,通过对关键词词频进行统计,使用TF-IDF算法构建用户标注词表、形成标引知识库,然后通过IKAnalyzer分词软件对待标引的科技项目数据进行切词和停用词处理,进而使用TF-IDF算法和位置加权算法提取科技项目数据的特征词,最终实现对科技项目数据进行关键词和分类的同步标引.[结果/结论]实验结果表明,机标关键词与人标的相似比在60%以上的科技项目数据占总数的68.1%,机标分类号与人标分类号前三位一致的占总数的83.9%,结果表明基于用户自然标注数据并采用TF-IDF算法在关键词和分类标引方面是可行的。
[ Purpose/significance ] This paper studies the TF 4DF assisted indexing algorithm based on the user natural annotation from the users' point of view. [ Method/process ] First, the keywords and the classification number in Chinese core journals were taken as the data source. The user natural annotation vocabulary was constructed by computing the keywords frequency and using the TF-IDF algorithm. Second, the featured words were extracted from the scientific and technological project data by the IK Analyzer word segmentation software and the TF-IDF algorithm. Finally, the keywords and classification number of the scientific and technological project data were indexed synchronously. [ Result/conclu- sion] The experiment indicates that the data of scientific and technical projects take up 68.1% in total. In these projects, the ratio similitude of the keywords of machine indexing and the keywords of human indexing is more than 60% in total. The ratio of the uniformity in the former three numbers of machine-indexed classification number and the human-indexed classification number is 83.9% in total. It is feasible to adopt the TF-IDF algorithm based on the users' natural annotation data.
出处
《图书情报工作》
CSSCI
北大核心
2018年第1期132-139,共8页
Library and Information Service
基金
2016年国家社会科学基金项目“基于知识组织的科研项目评审专家发现研究”(项目编号:16BTQ079)
2017年度中国科学技术信息研究所创新研究基金面上项目“面向国家科技大数据的知识图谱动态构建方法研究”(项目编号:MS2017-06)研究成果之一
关键词
辅助标引
用户自然标注
TF-IDF算法
信息组织
assisted indexing user natural annotation TF-IDF algorithm information organization