摘要
随着自发地理信息和中文地址要素切分技术的发展,地址要素的质量有待评价。本文针对中文地址文本切分产生的地址要素质量难以有效评价的问题,提出了一种多源数据和网络检索支持下的地址要素可信度评估方法。首先利用中文分词工具对地址要素进行分词与词性标注,通过分析词频和词性组合模式,对地址要素的命名结构进行可信度计算。其次基于大规模的地址样本、道路数据及POI数据,挖掘多源数据对地址要素的数据支撑,计算数据支持度。然后利用搜索引擎对地址要素进行快速检索,分析搜索结果与数量,对地址要素的网络可信度进行计算。最后提出一种地址要素综合可信度计算模型,实现地址要素的综合可信度计算。试验结果表明,该模型与方法不仅能够高效快速地计算中文地址文本中地址要素的可信度,还能够有效发现地址要素中存在的偏僻、虚假等相关问题,为地址要素的自动化检测与标准化处理提供参考。
With the development of spontaneous geographic information and Chinese address element segmentation technology, the quality of address elements needs to be evaluated. Aiming at the problem that the quality of address elements produced by Chinese address text segmentation is difficult to effectively evaluate, this paper proposes a method for evaluating the credibility of address elements supported by multi-source data and network retrieval. Firstly, the Chinese word segmentation tool is used to segment the address elements and part-of-speech tagging. By analyzing the word frequency and part-of-speech combination mode, the credibility of the naming structure of the address elements is calculated. Then, based on large-scale address samples, road data, and POI data, excavate the data support of multi-source data to address elements, and calculate the data support. Then use the search engine to retrieve the address elements quickly, analyze the search results and quantity, and calculate the network credibility of the address elements. Finally, a comprehensive credibility calculation model for address elements is proposed to realize the comprehensive credibility calculation of address elements. Experimental results show that the model and method can not only efficiently and quickly calculate the credibility of address elements in Chinese address texts, but also effectively discover the remoteness and falsehood of address elements, which provides a reference for the automatic detection and standardization of address elements.
作者
孙立财
陈以松
熊杰
罗安
王勇
SUN Licai;CHEN Yisong;XIONG Jie;LUO An;WANG Yong(Faculty of Geomatics,Lanzhou Jiaotong University,Lanzhou 730070,China;Chinese Academy of Surveying&Mapping,Beijing 100036,China;National-Local Joint Engineering Research Center of Technologies and Applications for National Geographic State Monitoring,Lanzhou 730070,China;Gansu Provincial Engineering Laboratory for National Geographic State Monitoring,Lanzhou 730070,China;China Telecom Corporation Limited Sichuan Branch,Chengdu 610015,China)
出处
《测绘通报》
CSCD
北大核心
2021年第10期108-113,共6页
Bulletin of Surveying and Mapping
基金
兰州交通大学优秀平台(201806)
国家重点研发计划(2017YFB0503502
2017YBF0503601)
中国测绘科学研究院基本科研业务费项目(AR2011)。
关键词
多源数据
地址要素
可信度评估
中文分词
归一化
multi-source data
credibility evaluation
Chinese word segmentation
address element
information normalize