期刊文献+

双结构网络中URL去重机制研究 被引量:1

Research on Detecting Duplicated URL in Dual-Structural Network
下载PDF
导出
摘要 针对双结构网络的特点及其URL去重面临的挑战,根据Bloom Filter的工作原理,提出一种基于可扩展的动态可分裂Bloom Filter的URL去重机制,并在原型系统中进行实现和部署。实验结果表明,该机制能够有效适用于大规模、高性能和分布式的双结构网络爬虫应用。 In this paper, the concept of Dual-Structural Network is firstly introduced and theprinciples of Bloom Filter are surveyed. Then, the basic requirements for detecting duplicatedURLs in Dual-Structural Network are analyzed. Moreover,a dynamic splittable Bloom Filter forweb crawlers is proposed, which can increase its capacity according to application requirementsand fit large-scale, high-performance and distributed web crawlers. Finally, the feasibility and ef-ficiency of the proposed Bloom Filter is demonstrated by a series of experiments.
作者 袁志伟 杨鹏 刘旋 YUAN Zhiwei YANG Peng LIU Xuan(School of Computer Science and Engineering, Key Laboratory of Computer Network and Information Integration of the Ministry of Education, Southeast University, Nanjing 211100,China)
出处 《太原理工大学学报》 CAS 北大核心 2016年第1期68-74,共7页 Journal of Taiyuan University of Technology
基金 国家863计划课题基金资助项目:基于内容聚类与兴趣适配的高效内容分发技术(2013AA013503) 国家自然科学基金资助项目:具有互补双结构的新型网络及关键技术研究(61472080) 中国工程院咨询研究基金资助项目:国家第二网络战略研究(2015-XY-04)
关键词 统一内容标签去重 动态可分裂 布隆过滤器 双结构网络 网络爬虫 duplicated URL detection dynamic splittable bloom filter dual-structural net work web crawler
  • 相关文献

参考文献3

二级参考文献65

  • 1马建国,邢玲,李幼平,李在铭.数据广播中的UCL标引与传输机制[J].电子学报,2004,32(10):1621-1624. 被引量:24
  • 2Kleinrock L. Information flow in large communication nets[D]. Cambridge: Massachusetts Institute of Technology, 1961. 被引量:1
  • 3Saltzer J H, Reed D P, Clark D D. End-to-end arguments in system design[J]. ACM Transactions on Computer Sys- tems, 1984, 2(4): 277-288. 被引量:1
  • 4Computer Communications Blumenthal M S, Clark D D. Rethinking the design of the internet: the end to end arguments vs. the brave new world EJ]. ACM Transactions on Internet Technology, 2001,1(1) : 70 - 109. 被引量:1
  • 5Clark D D, Wroclawski J, Sollins K K, et al. Tussle in eyberspace: defining tomorrow's internet[J~. IEEE/ACM Tran- sactions on Networking,Z005,13(3) ..462 - 475. 被引量:1
  • 6Zhang L, Estrin D, Burke J, et al. Named data networking (NDN) project, NDN-0001[EB/OL]. E2010 - 10 - 31]. http ://named-data. net/ndn-proj, pdf. 被引量:1
  • 7Ahlgren B, Dannewitz C, Imbrenda C, et al. A survey of information-centric networking[-J~. IEEE Communications Magazine, 2012, 50(7): 26-36. 被引量:1
  • 8Rexford J, Dovrolis C. Future internet architecture: clean-slate versus evolutionary researeh[J~. Communications of the ACM, 2010, 53(9): 36-40. 被引量:1
  • 9Pathan I A M K, Buyya R. A taxonomy and survey of CDNs, technical report, GRIDS-TR 2007-4[R]. Australia: The University of Melbourne, 2007. 被引量:1
  • 10Androutsellis-Theotokis S, Spinellis D. A survey of peer-to-peer content distribution technologies[J]. ACM Computing Surveys, 2004, 36(4): 335- 371. 被引量:1

共引文献12

同被引文献4

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部