摘要
PageRank算法是目前被广泛应用的一种度量网页重要性的方法,它根据网页之间的链接结构来给每个网页打分。从数学的角度来解释,PageRank可以被看作是一个马尔可夫随机游走模型,依据网页下一步的链出信息计算网页的转移概率。受计算机象棋算法设计中一个很成功的策略:“多看几步”的启发,改进和推广了经典PageRank算法,提出了更为一般的N-stepPageRank算法,它在计算网页的转移概率时利用了网页N步的链接信息。经典PageRank算法是N-stepPageRank算法N=1时的特殊情形。TREC标准数据集上的试验表明,N-stepPageRank算法能够有效地提高网页搜索的精确度,MAP指标比经典的PageRank的提高超过15%。
PageRank has been widely used to measure the importance of web pages based on their interconnections in the web graph. Mathematically speaking, PageRank can be explained using a Markov random walk model, in which only the direct out-links of a page contribute to the transition probability of this page. Improving the PageRank algorithm by looking N-step forward is proposed when constructing the transition probability matrix, the motivation comes from the similar "looking N-step forward" strategy that is successfully used in computer chess. It is clear that The N-step PageRank algorithm is the generalization of the classical one. Experimental results on the dataset of TREC Web track show that our proposed algorithm can boost the search accuracy of classical PageRank by more than 15% in terms of mean average precision.
出处
《科学技术与工程》
2007年第5期673-677,共5页
Science Technology and Engineering