摘要
在微博系统中,寻找高质量微博用户进行关注是获取高质量信息的前提。该文研究高质量微博用户发现问题,即给定领域词查询,系统根据用户质量返回相关用户排序列表。将该问题分解成两个子问题:一是领域相关用户的检索问题,二是微博用户排序问题。针对用户检索问题,提出了基于用户标签的用户表示方法以及基于维基百科的查询—用户相似度匹配方法,该方法作为ESA(explicit semantic analysis)的一个扩展应用,结果具有良好的可解释性,实验表明基于维基百科的效果要优于基于其他资源的检索效果。针对用户排序问题,提出了基于图的迭代排序方法 UBRank,在计算用户质量时同时考虑用户发布消息的数量和消息的权威度,并且只选择含URL的消息来构建图,实验验证了该方法的高效性和优越性。
In nowadays microblogging systems such as Twitter or Weibo,searching high quality users to follow is essential for acquiring information.This paper is focuses on the task of high quality user identification,i.e.,given a domain query,return a user list according to user quality.We divide the task into two sub-problems:the user search and the user ranking.As for the user search,we represent users according to their tags and propose a similaritybased retrieval approach using the Chinese Wikipedia,which is essentially an extension of the current ESA(explicit semantic analysis)method.As for the user ranking,we propose a graph-based ranking method called UBRank,which considers both the quantity and the quality of the published posts to measure the user importance.Experiments indicate that using Chinese Wikipedia is better than other resources such as HowNet,and validate the efficiency and superiority of the ranking method.
作者
叶永君
李鹏
周美林
万仪方
王斌
YE Yongjun;LI Peng;ZHOU Meilin;WAN Yifang;WANG Bin(Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100093,China;University of Chinese Academy of Sciences,Beijing 100049,China)
出处
《中文信息学报》
CSCD
北大核心
2018年第7期109-115,共7页
Journal of Chinese Information Processing
基金
国家自然科学基金(61402466)
国家高技术研究发展计划(863)(2015AA016005)