期刊文献+

中文Deep Web的大小、质量及分布 被引量:1

Deep Web in Chinese: Size, Quality, Distribution
下载PDF
导出
摘要 Deep Web中包含着大量高质量内容,当前的搜索引擎技术还不能对其进行搜索,研究Deep Web的大小、质量及分布情况将有助于找到对其进行有效搜索的方法和技术。以网络蜘蛛采集的2006年10月的数据为样本,利用统计、概率等定量方法和定性方法,首次对中文Deep Web的大小、质量及分布情况进行调查,得出概况如下:①Deep Web大小比Surface Web的大240倍以上;②包含的文件总数量和总存储量分别为507亿、11700TB;③可搜索数据库数量超过3万个;④内容质量较高;⑤内容主题分布不均匀。 There are lots of valuable contents in Deep Web that can't be searched by current Search Engine technology. It' s useful to find an effective way or technology to search the deep web by researching the size, quality, distribution of Deep Web. With statistical, probabilistic and qualitative methods, firstly research the size, quality, distribution of Deep Web in Chinese with the sample data fetched by a web spider in October, 2006. Results are as below: ①the size of Deep Web is 240 times more than Surface web; ②the total count and storage of the Deep Web is 50.7 billion and 1.17 thousand TB; ③the count of Searchable Data Bases is more than 30 thousand;④the quality of contents are higher; ⑤the distribution of contents is not even.
出处 《情报学报》 CSSCI 北大核心 2008年第2期256-260,共5页 Journal of the China Society for Scientific and Technical Information
关键词 DEEP WEB 中文 WEB 搜索引擎 deep Web, Web in Chinese, search engine
  • 相关文献

参考文献12

二级参考文献33

  • 1谢武,韩元杰.基于数据挖掘和证据理论的综合评价的研究[J].现代电子技术,2005,28(17):56-58. 被引量:2
  • 2杨帆,王秀伟,白振兴.基于Google的网站优化技术[J].现代电子技术,2006,29(19):149-151. 被引量:4
  • 3[2]赵江华,闫宏飞,王建勇等. 天网中的并行与分布处理. 北京大学,技术报告:PKU CS NET TR2002001, 2002. Http://162.105.80.88/crazysite/home/report(Zhao Jianghua, Yan Hongfei, Wang Jianyong et al. Parallel and distributed processing in WebGather(in Chinese). Peking University, Tech Rep: PKU CS NET TR2002001, 2002.Http://162.105.80.88/crazysite/home/report) 被引量:1
  • 4[3]Yan Hongfei, Wang Jianyong, Li Xiaoming. A dynamically reconfigurable model for a distributed web crawling system. In: 2001 Int'l Conf Computer Networks and Mobile Computing. Beijing, 2001. 157~162 被引量:1
  • 5[4]Marc Najork, Janet L Wiener. Breadth-first search crawling yields high-quality pages. In: Proc of the 10th Int'l World Wide Web Conf. Hongkong, 2001. 114~118 被引量:1
  • 6[5]Li Xiaoming, Wang Jianyong. WebGather: Towards quality and scalability of a web search service. In: Proc of the 10th Int'l World-Wide Web Conf. Hongkong, 2001 被引量:1
  • 7[7]中国互联网络信息中心(CNNIC). 信息服务. 2000. http://www.nic.edu.cn/INFO/cindex.html(CNNIC. Information service(in Chinese), 2000. http://www.nic.edu.cn/INFO/cindex.html) 被引量:1
  • 8[9]Andrei Broder, Ravi Kumar, Farzin Maghoul et al. Graph structure in the web: Experiments and models. In: Proc of the 9th Int'l World-Wide Web Conf. Amsterdam, 2000. 309~320 被引量:1
  • 9[10]Reka Albert, Hawoong Jeong, Albert-Laszlo Barabasi. Internet: Diameter of the world-wide web. Nature, 1999, 401: 130~131 被引量:1
  • 10[11]S R Kumar, P Raghavan, S Rajagopalan et al. Trawling the Web for emerging cyber-communities. In Proc of the 8th Int'l World-Wide Web Conf. Toronto, Canada, 1999. http://www8.org/w8-papers/4a-search-mining/trawling/trawling.html 被引量:1

共引文献42

同被引文献18

  • 1郑冬冬,赵朋朋,崔志明.Deep Web爬虫研究与设计[J].清华大学学报(自然科学版),2005,45(S1):1896-1902. 被引量:28
  • 2黄晓冬.Invisible Web研究综述[J].情报科学,2004,22(9):1144-1148. 被引量:19
  • 3杨道玲.深网信息资源采集初探[J].图书馆杂志,2006,25(12):19-22. 被引量:12
  • 4赵朋朋,崔志明,高岭,仲华.关于中国Deep Web的规模、分布和结构[J].小型微型计算机系统,2007,28(10):1799-1802. 被引量:13
  • 5Exploring a Deep Web That Google Cant Grasp [EB/ OL 1- [ 2015-05-12 ]. http ://www.nytimes.com/2009/02/23/tech- nology/internet/23 search, html?_r=-2&th&emc=th.%20Retrieved. 被引量:1
  • 6Sriram Raghavan,Hector Garcia-Molina. Crawling the Hidden Web [EB/OL]. [2015-08-123. http://ilpubs.stan- ford.edu : 8090/456/1/2000-36.pdf. 被引量:1
  • 7Michael K. Bergman. The Deep Web:Surfacing Hid- den Value [EB/OL]. [2015-08-121. http ://quod.lib.umich.edu/ cgi/t/text/text-idx?c=jep;view=text;rgn=main;idno=3336451. 被引量:1
  • 8Chang KCC,He B,Li C,et al. Structured databases on the web : Observations and implications [J ]. ACM SIGMOD Record, 2004,33 (3) : 61-70. 被引量:1
  • 9暗网[EB/OL].[2015-08-12].http://zh.wikipedia.org/zh-cn/%E6%gA%97%E7%BD%91. 被引量:1
  • 10Zhao Pengpeng, Cui Zhiming, Gao Ling,Zhong Hua. Vision-based Deep Web Query Interfaces Automatic Extraction [J ]. Journal of Computational Information Systems, 2007,3 (4):1441-1448. 被引量:1

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部