摘要
随着互联网的迅猛发展,如何快速、有效、准确地搜索信息成为迫切需要解决的问题。该文针对传统的基于主题搜索算法执行效率不高、精确度低的缺点,设计了一种基于机器学习的链接分层搜索算法。该算法通过机器学习,得到页面链接模式并对待扩展结点分层。此算法能够有效地获得期望页面,从而避免遍历大量无关页面,提高了主题相关页面的获取效率和准确性。在对100家公司基于产品主题页面的搜索实验中获得了较好的效果,证明该算法具有很好的执行效率和实际可行性。
With the explosion of World Wide Web, how to search information efficiently and accurately has become an urgent problem. Traditional algorithm based on topic search lacks of efficiency and accuracy. To resolve this problem, we propose a hyperlink hierarchical model, and use machine- learning method to learn it. Compared with traditional algorithms, our approach significantly improves crawling efficiency. In the test of searching product information in websites of 100 companies, our algorithm behaves effectively. It is proved to have good efficiency and feasibility.
出处
《计算机仿真》
CSCD
2005年第9期109-112,共4页
Computer Simulation