摘要
我国西部的沙漠、戈壁、荒漠地区拥有优质的太阳能和风能资源,由于外送距离远、输电容量大等特点,特高压工程将成为主要的电能输送手段。特高压工程数据具有数量大、关联度高、数据结构性差的特点,传统以专家经验为基础的工程数据收集分析手段已经无法满足日益增长的数据增长需求。知识图谱技术能有效结构化工程数据,传统基于子字符串的命名实体识别技术生成了具有大量负样本的子字符串,对模型的精度具有不利影响。提出一种改进的命名实体识别算法,首先在分析特高压工程典型文本的基础上构建知识图谱本体层,其次利用考虑实体边界的负采样技术削减子字符串样本数量,提高命名实体识别效率,最后利用关系抽取算法得到实体对及关系类别。实验表明:所提算法在精度上与参考算法差别不大,运行效率提高了9%,验证了模型的有效性。
The deserts,Gobi,and desert areas in western China have high-quality solar and wind energy resources.Thanks to the characteristics of long transmission distances and large transmission capacity,UHV projects have become the main means of electricity transmission and already entered a large-scale construction stage.The UHV engineering data is characteristic of large quantity,high correlation,and poor data structure,thus traditional engineering data collection and analysis methods based on expert experience can no longer meet the growing demand for data growth.Knowledge graph technology can effectively structure engineering data and the named entity recognition technology based on substrings generates substrings with a large number of negative samples,which,however,has a negative impact on the accuracy of the model.To this end,this paper proposes an improved named entity recognition algorithm.First,the knowledge graph ontology layer is constructed based on the analysis of typical texts of UHV projects.Secondly,negative sampling technology that considers entity boundaries is used to reduce the number of substring samples and improve the efficiency of named entity recognition.Finally,a relationship extraction algorithm is used to obtain entity pairs and relationship categories.The experiments show that the accuracy of the algorithm proposed in this article is not much different from the reference algorithm,and the operating efficiency is increased by 9%,which verifies the effectiveness of the model.
作者
胡杰
许刚
齐立忠
郄鑫
HU Jie;XU Gang;QI Lizhong;QIE Xin(School of Electrical and Electronic Engineering,North China Electric Power University,Beijing 102206,China;State Grid Economic and Technological Research Institute Co.,Ltd.,Beijing 102200,China)
出处
《电网与清洁能源》
CSCD
北大核心
2023年第11期1-8,19,共9页
Power System and Clean Energy
关键词
特高压工程
知识图谱
自然语言处理
命名实体识别
关系识别
ultra-high voltage construction project
knowledge graph
natural language processing
named entity recognition
relationship recognition