摘要
随着自媒体技术的不断发展,如何高效挖掘短文本数据信息已成为现阶段的研究重点。传统主题挖掘方法进行短文本数据分析时,仅考虑单位词出现频率进行判断,未考虑语义关联结构信息,分析效果欠佳。针对短文本数据的稀缺性,文中提出一种基于社交网络分析和LDA的主题挖掘分析模型。首先结合共词分析算法,分析不同文档间主题词的关系;然后结合社交网络分析算法,提高共词网络主题词耦合度;再借助隐含空间模型对共词网络进行降维,提高社交网络耦合性;最后结合隐含位置聚类算法发掘潜在社区,提高主题识别效果。实验结果表明,所提方法能够在一定程度上优化主题挖掘算法在识别短文本主题的效果,便于进行短文本研究,具有实用价值,也可为后续应用于前沿主题识别提供参考。
With the continuous development of self-media technology,how to efficiently mine short text data information has become the current research focus. When the traditional topic mining methods are used for short text data analysis and research, they only consider the frequency of unit words for judgment, and do not consider semantic related structure information,so the analysis effect is not good. In allusion to the scarcity of short text data,a topic mining analysis model based on social network analysis and LDA is proposed. The relationship between the subject words of different documents is analyzed by means of the co-word analysis algorithm,and the coupling degree of the co-word network subject words is improved by means of the social network analysis algorithm. The implicit space model is used to reduce the dimensionality of the co-word network to improve the coupling of social networks. The hidden location clustering algorithm is used to explore potential communities and improve the topic recognition effect. The experimental results show that the method proposed in this paper can optimize the effect of topic mining algorithm in identifying short text topics to a certain extent,and is convenient for researchers to conduct short text research. It has practical value,and can also provide reference for subsequent application in cutting-edge topic recognition.
作者
武帅
施奕
杨秀璋
项美玉
WU Shuai;SHI Yi;YANG Xiuzhang;XIANG Meiyu(School of Information,Guizhou University of Finance and Economics,Guiyang 550025,China;Lianshui County High-level Talent Development Center,Huaian 223200,China;Guiyang Institute for Big Data and Finance,Guizhou University of Finance and Economics,Guiyang 550025,China)
出处
《现代电子技术》
2022年第20期124-128,共5页
Modern Electronics Technique
基金
贵州省科技计划项目(黔科合基础[2019]1041)
贵州省科技计划项目(黔科合基础[2019]1403)
贵州省科技计划项目(黔科合基础[2020]1Y279)
贵州省科技计划项目(黔科合基础[2020]1Y420)
贵州省教育厅青年科技人才成长项目(黔教合KY字[2016]175
黔教合KY字[2021]135)
贵州财经大学2019年度校级项目(2019XQN01)。
关键词
LDA主题挖掘
共词分析
社交网络分析
短文本挖掘
隐含空间模型
隐含位置聚类
主题识别
吉布斯抽样
LDA topic mining
co-word analysis
social network analysis
short text mining
implicit space model
hidden location clustering
frontier theme recognition
Gibbs sampling