摘要
地址作为社会发展中重要的基础性数据资源,已成为城市地理空间数据化建设的重要组成部分。地名匹配旨在比较表示相同真实世界位置的配对字符串。当前地名匹配方法依赖于字符串相似性独立或多种混合相似性度量方法,这些方法无法有效地捕捉长句子上下文信息,不能充分理解地址含义。因此,提出一种基于XLNet算法的地名匹配方法,利用深度神经网络将一对地名分类为匹配或不匹配。该方法利用长程记忆并使用双信息流注意力掩码对事件序列进行重构,以利用其双向信息建立表征。实验结果表明,该方法可解决长地址匹配问题,模型能较好地理解上下文语义信息,优于先前研究的单个相似度量及基于监督机器学习的方法。
Address,as an important fundamental data resource in social development,has become an essential component of urban geo-spatial data construction.Geographical name matching aims to compare paired strings representing the same real-world location.Current geographical name matching methods rely on either independent string similarity or a combination of multiple similarity metrics,which fail to effectively capture character substitutions involved in geographical name changes due to language and cultural variations.We proposed a geographical name matching method based on XLNet algorithm,which using a deep neural network to classify a pair of geographical name as match or non-match.The method based on long-term memory uses bidirectional information flow attention masks to reconstruct event sequences,establishing representations by using the bidirectional information of sequence.The experimental result demonstrates the effectiveness of this method in addressing the issue of lengthy address matching.The model can more comprehensively capture the semantic information conveyed within the context,which outperforms previous studies on single similarity metrics and supervised machine learning methods.
作者
郑诗语
邱芹军
谢忠
陶留锋
李伟杰
ZHENG Shiyu;QIU Qinjun;XIE Zhong;TAO Liufeng;LI Weijie(School of Computer Science,China University of Geosciences(Wuhan),Wuhan 430074,China;Laboratory of National Joint Engineering for Geo-information System,Wuhan 430074,China;Key Laboratory of Urban Land Resources Monitoring and Simulation,Ministry of Natural Resources,Shenzhen 518000,China)
出处
《地理空间信息》
2024年第8期59-63,88,共6页
Geospatial Information
基金
国家重点研发计划资助项目(2022YFB3904200,2022YFF0711601)
湖北省自然科学基金资助项目(2022CFB640)
地质探测与评估教育部重点实验室主任基金资助项目(GLAB2023ZR01)。