摘要
【目的/意义】针对基于关键词的科技文献聚类研究进行了一些探讨,包括:使用具有不同特征的关键词来实现文献聚类在效果上有何差异;如何按特征对关键词进行选择来提高文献聚类效果。【方法/过程】按照关键词词频与语义类型特征设置对照组进行实证研究,观察其对文献聚类密度及文献语义表示效果的影响。【结果/结论】单独使用具有超高频、次高频、研究主题或限定范围特征的关键词进行文献聚类能使聚类密度较为合适;超高频特征通常在其他频次中都具有体现,次高频词能同时反映不同频次的关键词特征,但次高频词对中频词特征的表示不够全面;将语义类型不同的关键词分开来实现文献聚类,其效果好于将关键词进行组配,语义类型不同的关键词间存在互斥性。【创新/局限】本文发现了在以关键词间的共现关系为基础来进行文献聚类时单独选择次高频或某一语义类别的关键词来实现文献聚类具有较好效果,但缺少对关键词间语义结构关系的进一步研究。
【Purpose/significance】This paper discusses the research of scientific literature clustering based on keywords, including,the use of keywords with different characteristics to achieve document clustering, what is difference in the clustering effect;how to select keywords according to the characteristics to achieve better clustering results.【Method/process】According to the frequency and semantic type of keyword, the control group was set for empirical study of literature clustering. To observe the impact of keyword characteristics on literature clustering density and semantic representation effect.【Result/conclusion】According to the research results, the proper clustering density can be obtained by using the keywords with the characteristics of ultra-high, sub-high, research topic or limited range alone for literature clustering. Ultra-high frequency characteristics are usually reflected in other frequencies. Sub-high frequency words can simultaneously reflect the characteristics of keywords of different frequencies, but its ability to express the characteristics of the mid –frequency words is not comprehensive enough;separating keywords with different semantic types and implementing document clustering is better than using words with multiple semantic types, and there is mutual exclusion between the keywords with different semantic types.【Innovation/limitation】This paper finds that when document clustering is based on the co-occurrence relationship between keywords, it is effective to select sub-high frequency keywords or keywords of a certain semantic category to achieve document clustering. However, further research on the semantic structure relationship between keywords is lacking.
作者
叶佳鑫
熊回香
杨滋荣
童兆莉
YE Jia-xin;XIONG Hui-xiang;YANG Zi-rong;TONG Zhao-li(School of Information Management,Central China Normal University,Wuhan 430079,China;School of Information,Guizhou University of Finance and Economics,Guiyang 550025,China)
出处
《情报科学》
CSSCI
北大核心
2021年第8期156-163,共8页
Information Science
基金
国家社会科学基金年度项目“融合知识图谱和深度学习的在线学术资源挖掘与推荐研究”(19BTQ005)。
关键词
词频
关键词语义
科技文献聚类
社会网络分析
frequency
keyword semantics
scientific literature clustering
social network analysis