摘要
针对线上用户间的链路预测对用户文本内容特征的挖掘不够充分的现象,提出了面向用户兴趣话题相似性的二次特征抽取方法。该方法应用主题模型得到任意用户的主题分布,利用用户在主题上相异的分布比例提取各自的兴趣话题集合,基于兴趣话题集合构造了一组话题相似性特征用于链路预测。不同于传统方法中对用户主题分布的直接利用,该方法对用户文本内容的相似性特征进行了再次挖掘,使得文本特征具有等同于结构特征的预测能力,并能够作为结构预测特征的有效补充。实验结果表明,内容特征的独立预测效果普遍优于结构特征,并且在联合预测中将结构特征的预测效果提高了3%。
A new topical feature extraction method based on similarities of user's topics is proposed to solve the insufficiency of topical feature mining of link predictions in social networks. The topic distributions of social network users are firstly obtained using a topic model and then topic groups of interests for each user are extracted for further similarity-based feature extractions. The proposed topical features exhibit comparable performance of structural features and is efficiently combined with structural features to achieve better results in link predictions. Experimental results based on the dataset collected from Sina Microblog show that independent prediction of topical features is better than that of structural features and the F-measure of structural features is improved by up to 3% with joint predictions.
出处
《西安交通大学学报》
EI
CAS
CSCD
北大核心
2016年第8期103-109,共7页
Journal of Xi'an Jiaotong University
基金
国家重点基础研究发展计划资助项目(2013cb329600)
国家自然科学基金资助项目(61372191
61572492
61472433)
关键词
链路预测
用户内容
主题模型
相似性特征
link prediction
user generated content
topic model
feature similarity