期刊文献+

基于主题搜索的链接分层算法设计与实现

Link Delamination Algorithm for Topic Search
下载PDF
导出
摘要 随着互联网的迅猛发展,如何快速、有效、准确地搜索信息成为迫切需要解决的问题。该文针对传统的基于主题搜索算法执行效率不高、精确度低的缺点,设计了一种基于机器学习的链接分层搜索算法。该算法通过机器学习,得到页面链接模式并对待扩展结点分层。此算法能够有效地获得期望页面,从而避免遍历大量无关页面,提高了主题相关页面的获取效率和准确性。在对100家公司基于产品主题页面的搜索实验中获得了较好的效果,证明该算法具有很好的执行效率和实际可行性。 With the explosion of World Wide Web, how to search information efficiently and accurately has become an urgent problem. Traditional algorithm based on topic search lacks of efficiency and accuracy. To resolve this problem, we propose a hyperlink hierarchical model, and use machine- learning method to learn it. Compared with traditional algorithms, our approach significantly improves crawling efficiency. In the test of searching product information in websites of 100 companies, our algorithm behaves effectively. It is proved to have good efficiency and feasibility.
出处 《计算机仿真》 CSCD 2005年第9期109-112,共4页 Computer Simulation
关键词 链接 搜索引擎 主题搜索 贝叶斯分类 Link Search engine Topic search Bayesian classifier
  • 相关文献

参考文献4

  • 1J Cho, H Garcia-Molina, L Page. Efficient Crawling through URL Ordering[C]. In Proceedings of the 7th International World Wide Web Conference, Brisbane, Australia, April 1998. 被引量:1
  • 2S Chakrabarti, M van den Berg, B Dom, Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery[C]. In Proceedings of 8th International World Wide Web Conference, Tornonto, Canada, May 1999. 被引量:1
  • 3J Cho, H Garcia-Molina and L Page. E±cient crawling through URL ordering[J]. Computer Networks, 1998,30(1):7,161,172. 被引量:1
  • 4G Pant. Deriving Link-context from HTML Tag Tree[C]. In 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2003. 被引量:1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部