摘要
近年来,数据规模呈爆炸式增长,使得传统集中式数据库难以满足业务需求.而分布式数据库可以将数据存储在多个节点上,具有更好的扩展性,从而可以支撑业务的不断增长.目前,许多企业已经开发出了成功的分布式数据库产品,例如Google Spanner、淘宝的OceanBase等.传统数据库模式设计中,三大范式(1NF、2NF和3NF)及其扩展范式能够减少数据冗余和更新异常,并保证数据的完整性.然而,在分布式架构下,严格遵循范式的模式设计可能带来查询效率较低等问题,而使用反范式模式设计方法通常可以有效提高查询效率.OceanBase是淘宝自主研发的分布式数据库,支持跨行跨表事务,并在OLTP中具有良好的性能,但是对于OLAP业务,其性能并不高.本文将以OceanBase为例,介绍如何利用反范式设计分布式数据库模式,以改善OLAP的查询性能,并通过在OceanBase上部署TPC-H基准评测验证了反范式模式设计的有效性和高效性.
Abstract: Recently, we have witnessed an exponential increase in the amount of data. It results in a problem that a centralized database is hard to scale-up to the massive business requirements. A distributed database (DDB) is an alternative that can be scalable to the large scale applications by distributing the data to multi-node server. Now, many enterprises have successfully implemented some distributed databases, such as Google Spanner and TaoBao OceanBase. In the theory of the designation of traditional database, different normal forms reduce the operational exception and data redundancy, and also ensure the data integrity. However, a schema design strictly following the normal forms leads to an inefficiently distributed database system because of the large amount of distributed relational operations. Fortunately, denormalization can significantly improve the query efficiency by reducing the number of relations and the amount of the distributed relational operations. OceanBase, a distributed database, is implemented by TaoBao and has high performance for OLTP, rather than OLAP. In this paper, we introduce how to utilize de-normalization to design the schema for OceanBase and to improve the performance of OLAP. Finally, we illustrate the efficiency and effectiveness of the denormalization design for OceanBase in the empirical study by using benchmark TPC-H.
出处
《华东师范大学学报(自然科学版)》
CAS
CSCD
北大核心
2014年第5期290-300,共11页
Journal of East China Normal University(Natural Science)
基金
国家973课题(2010CB731402)
关键词
反范式
分布式数据库
denormalization
distributed database
OceanBase
TPC-H