摘要
在用电安全领域,存在着大量数据、知识未得到充分挖掘和利用,构建领域知识图谱不仅可以实现用电安全知识的整合,还能极大地提高电力行业的工作效率。命名实体识别是构建知识图谱的基础性工作,研究了基于字典和规则的命名实体识别,通过领域实体词典、构词特征字符规则匹配和词性组合特征规则匹配三种方法从非结构文本中准确地提取用电安全相关实体,为用电安全领域知识图谱的构建提供高质量和高精度的实体。为优化识别流程、提高响应速度,将通用词性标注任务交由边缘节点进行处理,中心服务器仅需响应规则模板匹配等任务。在小规模测试实验中,综合使用三种方法对用电安全文本进行领域实体识别,F1值能达到85%以上。
In the field of electricity safety,there are a lots of data and knowledge has not been excavated and utilized,constructing a knowledge graph in the electricity safety field can not only integrate power knowledge,but also greatly improve the efficiency of the power industry.Named entity recognition(NER)is the basis for constructing knowledge graph,this paper studies the named entity recognition based on dictionaries and rules,through three methods:the domain entity dictionary,the word-building feature character rule matching and the part-of-speech combination feature rule matching,to accurately extract electricity safety related entities from non-structured text,providing high-quality and high-precision entities for the construction of knowledge graph in the field of electricity safety.In order to optimize the recognition process and improve the response speed,the general part-of-speech tagging task is sent to the edge node for processing,and the central server processes the rule template matching task.Experimental results show that using the three methods comprehensively to recognition the domain entity of small-scale electricity safety text,the F1 score can reach more than 85%.
作者
袁金斗
潘明明
张腾
姜珏
Yuan Jindou;Pan Mingming;Zhang Teng;Jiang Jue(China Electric Power Research Institute,Beijing 100192,China;State Grid Jiangsu Electric Power Co.,Ltd.,Nanjing 210000,China)
出处
《电子技术应用》
2022年第12期22-27,共6页
Application of Electronic Technique
基金
国家电网总部科技项目(5400-202118164A-0-0-00)。
关键词
用电安全领域
命名实体识别
领域字典
特征字符规则
词性组合规则
边缘计算
electricity safety domain
NER
domain dictionary
featured character rules
part-of-speech combination rules
edge computing