摘要
深入讨论了基于向量空间模型以及基于潜在语义分析的微博搜索排序算法,以新浪微博为例,通过建立实验系统,利用新浪微博公共开放平台提供的API获取实验数据,通过一个实验样例阐述向量空间模型和潜在语义分析的处理过程。新浪微博现有排序方法通常不能提供按照相关性排序的满意结果。利用向量空间模型以及潜在语义分析方法,构建"索引词-博文"矩阵,对博文进行分词和向量化。衡量博文和查询的相关度转化成计算博文向量和查询向量之间的相似度。把对博文和查询的处理简化为向量空间中向量的运算。由实验得知基于潜在语义分析的微博搜索排序算法有效地提高了博文的检索效率。
A searching and sorting method for Chinese microblog called Weibo is presented in this paper,based on the vector space model and latent semantic analysis.APIs,provided by the Sina microblogging public platform,are applied to obtain test data.Weibo posts using vector space model as matrix of "ndex-term content" are presented,and then a latent semantic analysis process on this matrix is performed.The relevance between Weibo contents and query was turned into the similarity between the Weibo content vector and query vector,which was calculated by the cosine value between Weibo content vector and inquiring vector decomposed by SVD.The treatment on the Weibo content and query was simplified as the operation for the vectors in the low-dimensional vector space.A sorting list of Weibo posts will be obtained according to their relevance to the query rather than the simple string-matching and post time descending order approach,which is widely used in many microblogging platforms.The experiment results indicate that the approach is able to retrieve the relevant posts in the top-ranked list.
出处
《常州大学学报(自然科学版)》
CAS
2013年第3期71-75,共5页
Journal of Changzhou University:Natural Science Edition
基金
国家自然科学基金项目(61003163)
江苏省科技厅项目(BZ2010021)
关键词
微博
向量空间模型
潜在语义分析
搜索排序
Weibo
vector space model
latent aemantic analysis
search ranking