摘要
搜索引擎是目前最主要的Web信息检索工具,然而它的效果还不能令人满意。基于Web链接结构的主题精选算法的链接分析迭代往往会收敛于链接图中与查询主题不太相关的紧密交织区域(TKC),从而导致主题偏移。笔者对经典主题精选算法HITS的分析表明该算法还有给不同的Web站点规定了不平等的影响权重以及不能满足用户多粒度的信息需求等缺点。文章在分析主题精选算法研究的基础上针对其不足提出了改进算法g-HITSc,实验表明该算法是合理和有效的。
Search engine is the most commonly used tool for Web information retrieval;however,its current status is still far from satisfaction.Topic distillation algorithm,which is based on Web link structure,is likely to converge at an irrelevant Tightly Knit Community(TKC),thus lead to topic drift.Analysis on the classical algorithm,HITS,shows that such algorithm not only fails to satisfy user's multiple-granularity information needs,but also tends to define unjust in-fluence weights for different authors of Websites.Based on these analysis it puts forward an improved algorithm g-HITSc,experimental results show that the new algorithm is reasonable and effective.
出处
《计算机工程与应用》
CSCD
北大核心
2004年第17期174-178,共5页
Computer Engineering and Applications
基金
国家自然科学基金项目(编号:60173036)
江苏省"十五"高科技项目(编号:BG2001013)资助