文献计量学是一种把握学科发展态势的定量分析方法。传统基于文献计量学的研究步骤需手动操作且流程繁琐,针对这一问题,设计并实现了一种基于scrapy-redis分布式爬虫的学科发展态势分析系统。该系统包含了1.负责爬取并解析web of scienc...文献计量学是一种把握学科发展态势的定量分析方法。传统基于文献计量学的研究步骤需手动操作且流程繁琐,针对这一问题,设计并实现了一种基于scrapy-redis分布式爬虫的学科发展态势分析系统。该系统包含了1.负责爬取并解析web of science文献数据的数据预处理层。解决了由于网速不稳定造成的爬虫丢失网页问题,保障数据完整性。设计了一种动态计算参考文献所属学科分布情况的算法2.基于Django搭建的结果展示层,通过web服务向用户展示学科态势分析结果。用户只需输入初始待爬取页面的URL即可通过web服务获得学科态势分析结果。该系统为文献计量学提供了一种更便捷、更快速、扩展性高的分析手段。展开更多
With the rapid development of social network,public opinion monitoring based on social networks is becoming more and more important.Many platforms have achieved some success in public opinion monitoring.However,these ...With the rapid development of social network,public opinion monitoring based on social networks is becoming more and more important.Many platforms have achieved some success in public opinion monitoring.However,these platforms cannot perform well in scalability,fault tolerance,and real-time performance.In this paper,we propose a novel social-network-oriented public opinion monitoring platform based on ElasticSearch(SNES).Firstly,SNES integrates the module of distributed crawler cluster,which provides real-time social media data access.Secondly,SNES integrates ElasticSearch which can store and retrieve massive unstructured data in near real time.Finally,we design subscription module based on Apache Kafka to connect the modules of the platform together in the form of message push and consumption,improving message throughput and the ability of dynamic horizontal scaling.A great number of empirical experiments prove that the platform can adapt well to the social network with highly real-time data and has good performance in public opinion monitoring.展开更多
文摘文献计量学是一种把握学科发展态势的定量分析方法。传统基于文献计量学的研究步骤需手动操作且流程繁琐,针对这一问题,设计并实现了一种基于scrapy-redis分布式爬虫的学科发展态势分析系统。该系统包含了1.负责爬取并解析web of science文献数据的数据预处理层。解决了由于网速不稳定造成的爬虫丢失网页问题,保障数据完整性。设计了一种动态计算参考文献所属学科分布情况的算法2.基于Django搭建的结果展示层,通过web服务向用户展示学科态势分析结果。用户只需输入初始待爬取页面的URL即可通过web服务获得学科态势分析结果。该系统为文献计量学提供了一种更便捷、更快速、扩展性高的分析手段。
基金This work is supported by State Grid Science and Technology Project under Grant Nos.520613180002,62061318C002the Fundamental Research Funds for the Central Universities(Grant Nos.HIT.NSRIF.201714)+4 种基金Weihai Science and Technology Development Program(2016DXGJMS15)Key Research and Development Program in Shandong Provincial(2017GGX90103)Fujian Young and Middle-aged Teacher Education Research Project,Grant No.JAT160466Jiangsu Polytechnic College of Agriculture and Forestry Key R&D Projects(2018kj11)Study and Development of Smart Agriculture Control System Based on Spark Big Data Decision(2017N0029).
文摘With the rapid development of social network,public opinion monitoring based on social networks is becoming more and more important.Many platforms have achieved some success in public opinion monitoring.However,these platforms cannot perform well in scalability,fault tolerance,and real-time performance.In this paper,we propose a novel social-network-oriented public opinion monitoring platform based on ElasticSearch(SNES).Firstly,SNES integrates the module of distributed crawler cluster,which provides real-time social media data access.Secondly,SNES integrates ElasticSearch which can store and retrieve massive unstructured data in near real time.Finally,we design subscription module based on Apache Kafka to connect the modules of the platform together in the form of message push and consumption,improving message throughput and the ability of dynamic horizontal scaling.A great number of empirical experiments prove that the platform can adapt well to the social network with highly real-time data and has good performance in public opinion monitoring.