摘要
刷岗是对某个最小存货单位(SKU)对应的岗位等维度字段发生变更,按照最新的SKU岗位等维度数据回溯、覆写的过程,刷岗这一行为在典型的电商企业(京东(JD)、淘宝)较为常见。针对京东零售业务场景下明细表和维表数据量庞大导致刷岗操作的运行时间过长的问题,提出一种基于ClickHouse的增量刷岗方法。首先,将维度表加载为ClickHouse字典表,采用明细表关联ClickHouse字典表的方法进行刷岗;其次,采用增量刷岗方法取代传统全量刷岗的形式,不仅提升刷岗效率,同时减少刷岗带来的集群资源消耗;最后,加入验数逻辑和并发控制机制保证数据准确性和集群的稳定。将该技术与传统的刷岗技术在实际业务生产场景中进行亿级数据的刷岗测试对比,实验结果表明,在硬件设备相同的情况下,提出的增量刷岗方法刷岗时间缩短80%,集群资源(CPU、内存)的使用减少50%,显著提高刷海量数据岗效率。
Dimension update involves modifying the dimension fields of a specific Stock Keeping Unit(SKU),such as its position,and performing backtracking and overwriting based on the latest SKU position and other dimension data.This practice is commonly observed in e-commerce companies like Jing Dong(JD)Retail and Taobao.Aiming at the problem that the large amount of detail table and dimension table data in JD’s retail business scenario causes long running time of the dimension update operation,an incremental dimension update method based on ClickHouse was proposed.Firstly,the dimension table was first loaded into ClickHouse as a dictionary table,and the method of associating the schedule with the ClickHouse dictionary table was used to update the dimension.Then,the traditional full-scale dimension update method was replaced by an incremental dimension update method.This not only improved the efficiency of the dimension update process but also reduces ClickHouse cluster resource consumption.Finally,the addition of data validation logic and concurrency control mechanisms was implemented to ensure data accuracy and maintain the cluster stability.In a real-world production scenario,a comparative experiment test was conducted between the proposed incremental dimension update method and the traditional dimension update method for billion-level data.The experimental results show that,with the same hardware setup,the incremental dimension update method reduces the dimension update time by 80%and decreases the utilization of cluster resources(CPU,memory)by 50%.The efficiency of dimension update for massive amounts data is significantly improved.
作者
季健
洪帅
陈洪健
钱叶
刘传耀
JI Jian;HONG Shuai;CHEN Hongjian;QIAN Ye;LIU Chuanyao(JD Retail,Beijing Wodong Tianjun Information Technology Company Limited,Shanghai 200443,China)
出处
《计算机应用》
CSCD
北大核心
2024年第S01期199-203,共5页
journal of Computer Applications