摘要
针对微博文本的特点,提出了基于微博转发集的过滤方法。借助微博转发集,构建<子串,频次,转发时间差>三元组,形成用户需求模板;以知网为知识源计算微博文本与用户需求模板的相似度,抽取用户感兴趣的内容形成候选文本集;根据提出的基于三元组的微博权重计算方法,对候选集做进一步筛选,最终得到用户需求的微博文本。实验结果表明,基于微博转发集的过滤方法在滤准率和滤全率2个指标上比基于关键词与KNN的方法有了显著的提高。
According to the characteristics of micro-blog text,the filtering method based on forwarding set of micro-blog is put forward.Triples-&lt;substring,frequency,forwarding time&gt; that will be used to form the user template with micro-blog forwarding set are constituted.Making use of Hownet as a source of knowledge,the similarity between the filtered micro-blog text and the user template is calculated to extract micro-blog texts that interest users and form a candidate set of micro-blog.The weight calculation method based on triples of the micro-blog text is proposed for further screening in order to get micro-blog texts the users need at last.Experimental results show that the method based on forwarding set of micro-blog greatly improves in the two indexes of the filtering precision and filtering recall in comparison with the keyword-based and KNN-based method.
出处
《北京信息科技大学学报(自然科学版)》
2013年第3期27-33,共7页
Journal of Beijing Information Science and Technology University
基金
国家自然科学基金项目(61271304)
北京市教委科技发展计划重点项目暨北京市自然科学基金B类重点项目(KZ201311232037)
国家科技支撑计划课题(2011BAH11B03)
关键词
微博转发集
三元组
相似度
微博权重
过滤
forwarding set of micro-blog
triple
similarity
micro-blog weight
filtering