摘要
针对专利技术主题识别效率偏低、识别难度大等问题,文章提出了FPC-Kmeans++(Kmeans plus plus with feature phrase clusters)专利聚类分析与技术主题识别方法,该方法创新性地使用特征短语替代传统的分词结果,作为专利数据分析的基础。文章以无人机专利为例,对该方法进行了实证检验。实验结果表明,相较于传统的Kmeans++(Kmeans plus plus)和LDAKmeans++(Kmeans plus plus with Latent Dirichlet Allocation)方法,该方法能更精确地判断出最佳主题数和得到层次更鲜明的聚类效果,展现了其在专利主题识别上的优势。并且,相较于其他对比算法,文章提出的NER-FPP(Named Entity Recognition with Feature Phrase Probability)算法在专利特征短语提取上效果最好,F1值分数最高,达到了93.36%。
In view of the low efficiency and high difficulty of patent technical topic recognition,this paper proposes a FPC-Kmeans++(Kmeans Plus Plus with Feature Phrase Clusters)patent clustering analysis and technical topic recognition method,which innovatively uses feature phrases instead of traditional word segmentation results as the basis for patent data analysis.Taking patents of Unmanned Aerial Vehicle(UAV)as examples,this method is empirically tested.The experimental results show that compared to traditional Kmeans++and LDAKmeans++(Kmeans Plus Plus with Latent Dirichlet Allocation)methods,the proposed method can more accurately determine the optimal number of topics and achieve more distinct hierarchical clustering effects,demonstrating its advantages in patent topic recognition.Furthermore,compared to other contrast algorithms,the proposed NER-FPP(Named Entity Recognition with Feature Phrase Probability)algorithm performs best in extracting patent feature phrases,with the highest F1 score reaching 93.36%.
作者
刘俊
王修来
LIU Jun;WANG Xiulai(School of Computer,Nanjing University of Information Science and Technology,Nanjing 210044,China;Nanjing Jinling Hospital,Affiliated Hospital of Medical School,Nanjing University,Nanjing 210016,China)
出处
《软件工程》
2024年第5期14-20,共7页
Software Engineering
基金
2022年国家社科基金一般项目(22BGL282)。