期刊文献+

基于Scrapy爬虫框架的领域网站文件爬取 被引量:5

Domain Website File Crawling Based on Scrapy Crawler Framework
下载PDF
导出
摘要 为了提高航行通告文件下载效率、节省人工资源,文章通过Scrapy爬虫框架,结合自然语言处理中的信息,分类爬取各民航局发布的航行通告文本。首先基于网页数据交互模式将网站分类,结合Selenium自动化测试工具进行网页下载。然后使用朴素贝叶斯算法将网站所有链接进行分类,区分为目标链接以及非目标链接,从而实现提取航行通告文本链接,此分类模型在领域类网站准确率为95.97%。 In order to improve the efficiency of downloading the notice to navigation document and save human resources,the article uses the Scrapy crawler framework combined with the information in natural language processing to classify and crawl the text of navigation notices issued by civil aviation administration.Firstly,web sites are classified based on web data interaction mode,and then web pages are downloaded with Selenium automated testing tool.Then the Naive Bayes algorithm is used to classify all the links of the website into target link and non-target link,so as to extract the text link of the notice of navigation.The accuracy rate of this classification model in the domain website is 95.97%.
作者 邹维 李廷元 ZOU Wei;LI Tingyuan(School of Computer Science,Civil Aviation Flight University of China,Guanghan 618307,China)
出处 《现代信息科技》 2020年第21期6-9,共4页 Modern Information Technology
关键词 Scrapy 爬虫 SELENIUM 朴素贝叶斯 Scrapy crawler Selenium Naive Bayes
  • 相关文献

参考文献6

二级参考文献29

  • 1LEWIS D D. Representation and learning in information retrieval[D]. Maassachusetts: Graduate School of the University of Maassachusetts, 1992. 被引量:1
  • 2LEWIS D D, RINGUETIE M. A comparison of two learning algorithms for text categorization[ M]. Proceedings of SDAIR -94, 3rd Annual Symposium on Document Analysis and Information Retrieval , 1994: 81 - 93. 被引量:1
  • 3YANG Yi-ming, PEDERSEN J O. A comparative study on feature selection in text categorization [ M ]. Proceedings of ICML- 97, 14th International Conference on Machine Learning, 1998. 被引量:1
  • 4SALTON G, BUCKLEY C. Weighting approaches in automatic text retrieval [ J ]. Information Processing and Management, 1988, 24(5) :513 - 523. 被引量:1
  • 5McCALLUM A, NIGAM K. A comparison of event models for Naive Bayes text classification [ M ]. Proceedings of AAAI 98 Workshop on Learning for Text Categorization, 1998. 被引量:1
  • 6CRAVENM, DiPASQUOD, FREITAGD,etal. Leaming to extract symbolic knowledge from the World Wide Web [ M ].Proceedings of the Fifteenth National Conference on Artificial Intellligence (AAAI98), 1998: 509 - 516. 被引量:1
  • 7MitchellTM著 曾华军 张银奎译.机器学习[M].北京:机械工业出版社,2003.. 被引量:46
  • 8唐中富,姚泽华,钱剑雄,张有忱.基于模糊评价和层次分析法的客运索道安全评价方法研究[J].中国安全科学学报,2008,18(6):152-157. 被引量:38
  • 9王岩.搜索引擎中网络爬虫技术的发展[J].电信快报(网络与通信),2008(10):20-22. 被引量:11
  • 10何国生.建立航行情报质量监督管理体系探讨[J].中国民用航空,2011(6):53-55. 被引量:3

共引文献49

同被引文献34

引证文献5

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部