摘要
提出基于ERNIE序列标注的地址分级模型进行地址提取识别,将地址分级问题转换为一个序列标注的NLP问题。首先将原始待分级地址文本输入到训练好的ERNIE命名实体识别算法训练模型中,得到11级地址的粗略分级;然后应用AC自动机算法,对地址的前5级地址进行补全或纠正,再通过正则化匹配对地址后4级进行纠正。提出的模型不仅可以提高地址解析的准确率,还可以对错误地址进行纠正,最后将模型用于真实数据集,验证了方法的有效性。
It proposes to identify the address extraction based on the address grading model of ERNIE,and to convert the address grading problem into an NLP problem of sequence labelling.Firstly,the original addresses are input into the pre-trained ERNIE Named Entity Recognition model so that rough grading of 11 addresses are obtained.Secondly,AC automaton algorithm and regular expression are applied to completing or correcting the first 5 and last 4 addresses respectively.The proposed model can not only improve the accuracy of address resolution,but also correct the wrong address.Finally,the model is used on real data sets,and the validity of the method is verified.
作者
刘贤松
屠梓浩
高有利
Liu Xiansong;Tu Zihao;Gao Youli(China Unicom Network AI Center,Shanghai 200050,China)
出处
《邮电设计技术》
2023年第2期89-92,共4页
Designing Techniques of Posts and Telecommunications
关键词
地址分级
地址提取
序列标注
ERNIE算法
Address gradation
Address extraction
Sequence labelling
ERNIE algorithm