摘要
数据开放共享在大数据时代变得越来越重要,API接口在数据共享上扮演着重要的角色,如何从开放共享的API接口快速、高效、便捷地获取数据是迫切需要解决的问题。本文从实用性的角度出发,融合了自动化测试技术、最优线程、Python和ETL技术等,构建了一种基于API接口的数据采集策略,该策略采集速度快、操作简单、线程可控制并推导出数据采集时间公式,该公式在5个线程以上准确率达90%以上,在7~8个线程准确率达97%,在9~10个线程准确率可达99%,在采集之前就可通过该公式以最合理的线程计算出最合理的采集时间,极大地节省采集时间。
Data open sharing is becoming more and more important in the era of big data. API interfaces play an important role in data sharing. How to quickly, efficiently and conveniently obtain data from open and shared API interfaces is an urgent problem to be solved. From the perspective of practicability, this article integrates automated testing technology, optimal threading, Python and ETL technology, etc., and constructs a data collection strategy based on API interface, which has fast collection speed, simple operation, thread control and deducing and formulating the formula of data collection time. This formula has an accuracy rate of more than 90% for more than 5 threads, an accuracy rate of 97% for 7 - 8 threads, and an accuracy rate of 99% for 9 - 10 threads. Through this formula, the most reasonable acquisition time is calculated with the most reasonable thread, which greatly saves the acquisition time.
出处
《计算机科学与应用》
2020年第11期1927-1937,共11页
Computer Science and Application