摘要
随着电网运营规模的提升,电网运营的数据量也呈现出快速增长的态势。若将运营数据存放至本地服务器,则会出现交换速度慢且系统容灾能力不足的问题,还容易引起数据丢失或服务中断等现象。针对上述传统电网运营数据系统架构存在的不足,使用Hadoop分布式云存储作为电网运营数据的存放载体,并通过引入TF-IDF算法及LDA主题分类算法对该数据进行分类,旨在提升电力运营服务的质量。在运算速度测试中,搭建的Hadoop云存储系统在处理多个大文件时的速度与单台服务器相比有显著的优势。在分类算法效果测试中,文中设计的文本分类算法可将文本数据分类成不同的主题,能够更有针对性地提高运营服务水平。
With the improvement of power grid operation scale,the amount of power grid operation data also shows a trend of rapid growth,while storing operation data to local server has the problems of slow exchange speed and insufficient system disaster tolerance capacity,which is easy to cause data loss or service interruption.In view of the above shortcomings of the traditional power grid operation data system architecture,uses Hadoop distributed cloud storage as the storage carrier of power grid operation data,and classifies power grid operation data by introducing TF-IDF algorithm and LDA subject classification algorithm,in order to improve the quality of power operation services.In the computing speed test,the Hadoop cloud storage system built in this paper has obvious advantages over a single server in processing multiple large files.In the effect test of the classification algorithm,the text classification algorithm designed in this paper can classify the text data into different topics,which can improve the operation service level more pertinently.
作者
全龙翔
王茜璇
艾力·海如拉
QUAN Longxiang;WANG Xixuan;AILI·Hairula(State Grid Xinjiang Marketing Service Center,Urumqi 830017,China)
出处
《电子设计工程》
2023年第10期79-82,87,共5页
Electronic Design Engineering
基金
国网公司科技项目(JL71-15-042)。