摘要
虚拟组织是网格体系结构中的基本组织单元 ,借鉴网格研究中对虚拟组织的特性分析 ,可以在网络信息检索研究中定义虚拟站点的概念。实验发现 ,虚拟站点入口页面是网络信息环境中具有较高质量的一个网页集合 :实验表明 ,仅为全部页面数量 2 1%的此类页面就涵盖了 70 %以上的超链接 ,对这个集合进行的内容检索也比对网页全集的检索有超过 6 0 %的性能提高。这提供了一种在减少索引规模前提下提高网络信息检索性能的解决方案。
Virtual Organization (VO) is a basic concept in grid architecture. Analysis in the link structure of Web pages showed that there exist similar organizations in internet which were called Virtual Sites. Many features of virtual organizations could be founded in virtual sites, especially some non-content features, which were further used to select entry pages of Virtual Sites. This subset of Virtual Site entry pages proved to be qualified both in content and link structure analysis. Although this entry page set contains only about 21% pages of the whole collection, it covers more than 70% of its links. Furthermore, information retrieval on this page set makes more than 60% improvement with respect to that on all pages.
出处
《中文信息学报》
CSCD
北大核心
2005年第2期44-50,共7页
Journal of Chinese Information Processing
基金
国家重点基础研究资助项目 (973) (2 0 0 4CB31810 8)
自然科学基金资助项目 (6 0 2 2 30 0 4
6 0 32 10 0 2
6 0 30 30 0 5 )
关键词
计算机应用
中文信息处理
网络信息检索
非内容特征
虚拟组织
computer application
Chinese information processing
Web information retrieval
non-content feature
virtual organization.