摘要
蛋白质组学质谱数据具有关系复杂、数据量大、查询方式多样等一系列的特点。在研究蛋白质组学质谱数据时,传统的存储系统一般采用文件和关系型数据库存储数据,往往需要预定义数据表结构,难以实现动态增加多样化蛋白质组信息的功能。此外,关系型数据库的集群架构关系复杂,维护成本高,代码处理也复杂。为解决传统存储系统在海量蛋白质组学数据的存储和访问的效率瓶颈问题,应用No SQL非关系型数据库,提出了一种基于Mongo DB分布式数据库存储结构的蛋白质组学数据存储系统设计方案。通过系统的功能测试、性能测试,结果表明,随着数据量和访问量的上升,Mongo DB显示出了更高的性能和更快的处理速度,该平台能够改善传统的文件存储和关系数据库存储所暴露的部分性能问题。
Data of proteomics mass spectrometry has the characteristics of large amount,complex relationship and many query fashions.In the study of proteomics mass spectrometry data,the traditional storage system generally use a file or relational database to storage data,which need to predefine data table structure,and makes it difficult to dynamically add a wide variety of proteomic information.Besides that,clustering architecture of relational database is complex,maintenance cost is high,and code process is complicated.To solve the efficiency bottleneck of the traditional relational database in big data storage and access,by introducing No SQL non-relational database,a proteomic data store system design based on Mongo DB database storage architecture was proposed.By function texts and performance tests,the results illustrate that Mongo DB has higher performance and faster processing speed with the amount of data and traffic increasing,the platform can solve some performance problems existed in file storage and the relational database.
出处
《计算机应用》
CSCD
北大核心
2016年第A01期232-236,共5页
journal of Computer Applications
基金
国家国际科技合作专项(2014DFB30010)