摘要
目的为解决传统方法采集、存储和处理海量中医药数据的低效问题,探索数据管理的新策略。方法根据中医药数据的典型特征,设计基于Hadoop的分层管理架构,对串行数据挖掘算法进行MapReduce化改进;部署单节点服务器和分布式集群,采用8组不同规模的数据集,进行数据采集实验和串并行算法实验。结果数据传输时间在非分布式环境下通常超过3000 s,增幅较大,而在分布式集群下一般不超过300 s,增幅平缓;当数据规模超过一定范围后,与伪分布式和完全分布式下的并行算法比较,非分布式下串行算法的运行耗时急剧增加。结论与传统单节点系统相比,基于Hadoop的中医药数据管理平台采集、存储及处理海量数据的效率明显提高,尤其适用于大规模非结构化或半结构化的中医药数据。
Objective To solve the inefficiencies of traditional methods of collecting,storing and processing mass TCM data;To explore new strategies for data management.Methods According to the typical characteristics of TCM data,a hierarchical management architecture based on Hadoop was designed and a processing algorithm based on MapReduce was improved.The single node server and Hadoop distributed clusters were deployed.Data acquisition experiment and serial and parallel algorithm experiments were conducted,using eight groups of data sets of different sizes.Results The data transfer time was usually more than 3000 seconds with larger increase under non-distributed environment,while it generally did not exceed 300 seconds with moderate growth rate in distributed clusters.In addition,when the data size exceeded a certain range,the running time of the serial algorithm under non-distributed environment was drastically increased,comparing with the parallel algorithm under pseudo-distributed and fully distributed environment.Conclusion Compared with the traditional single node system,the TCM data management platform based on Hadoop has significantly improved the efficiency of collecting,storing and processing massive data,especially for large-scale unstructured or semi-structured TCM data.
作者
梁杨
丁长松
于俊洋
LIANG Yang;DING Chang-song;YU Jun-yang(School of Information Science and Engineering,Hunan University of Chinese Medicine,Changsha 410208,China;School of Information Science and Engineering,Central South University,Changsha 410083,China;Software School,Henan University,Kaifeng 475001,China)
出处
《中国中医药信息杂志》
CAS
CSCD
2018年第5期96-100,共5页
Chinese Journal of Information on Traditional Chinese Medicine
基金
国家重点研发计划(SQ2017YFC170323)
湖南省重点研发计划(2017SK2111)
湖南中医药大学青年教师科研基金(99820001-221)