摘要
针对多媒体资源在网上的分布特点,采用链接类型过滤、网页内容过滤、链接内容过滤三层过滤和临时页面存储、目标页面存储、中间链接存储、更新存储四层存储机制,设计并实现了一个对包含多媒体资源(音频、视频和Flash动画)的网页进行搜集的主题蜘蛛.实验结果显示,该主题蜘蛛能有效提高查准率.
According to the distributed characteristics of multimedia resources, employing the way of three-filter that is link type filter, page content filter, link content filter and four-store that is temporary page store, targeted-page store, middle link store, update store, a focused-spider which is applied to collect the Web pages that contain the multimedia resources (including audio, video and Flash), is designed and implemented. The experiment results show that this focused-spider can raise the precision ratio greatly.
出处
《郑州大学学报(理学版)》
CAS
2007年第2期42-45,49,共5页
Journal of Zhengzhou University:Natural Science Edition
基金
山东省自然科学基金资助项目
编号y2005G21
关键词
主题蜘蛛
链接过滤
内容过滤
focused-spider
link filter
content filter