期刊文献+

Lucene自适应分词的地址匹配方法改进与实现 被引量:4

Improvement and realization of address matching based on adaptive word segmentation in Lucene
原文传递
导出
摘要 为提高地理编码系统对输入地址的分词适应性及匹配准确度,该文基于Lucene索引及查询机制提出了一种可适应于中文非标准地址的地址匹配改进方法。首先依据中文地址模式创建地址元素分层索引库,然后将拼音三叉树、同义词配置、未登录词配置等功能集成于IK分词器,获得初次匹配结果集合后计算编辑距离并排序选取返回值。匹配系统以浙江省台州市公安地址及行政法人地址为数据基础构建分词库和索引库,结果表明,该方法可实现输入地址的自适应分词,对中文非标准地址的匹配效果良好,能够服务于测绘和地理信息的相关应用场景。 For improving segmentation adaptability and matching accuracy of the input address in geocoding,an improved method of address matching that can adapt to Chinese non-standard addresses was proposed based on Lucene index and query mechanism.Firstly,the method created a hierarchical index library of address elements in view of Chinese address patterns.Secondly,default tokenizer was transformed into a tokenizer with compound functions,including the ternary search trie composed of pinyin,synonym configuration and unregistered word recognition.Finally,Levenshtein distance would be introduced as an indicator of the results after obtaining the first matching set.The matching system built the word segmentation database and index database from address corpus of public security bureau and legal entity of administration in Taizhou city,Zhejiang province.The results indicated that this method could realize the adaptive word segmentation of the input address,and it had a significant matching effect for Chinese non-standard addresses.It provides theoretical and practical support for related application on surveying and mapping and geographic information.
作者 张琛 陈张建 刘江涛 任福 张红伟 ZHANG Chen;CHEN Zhangjian;LIU Jiangtao;REN Fu;ZHANG Hongwei(Key Laboratory of Urban Land Resources Monitoring and Simulation,Ministry of Land and Resources,Shenzhen,Guangdong 518034,China;School of Resource and Environmental Sciences,Wuhan University,Wuhan 430079,China;Zhejiang Academy of Surveying and Mapping,Hangzhou 311000,China;Information Center of Planning Land Real-Estate of Shenzhen,Shenzhen,Guangdong 518040,China)
出处 《测绘科学》 CSCD 北大核心 2021年第10期185-193,共9页 Science of Surveying and Mapping
基金 国土资源部城市土地资源监测与仿真重点实验室开放基金资助课题项目(KF201602028)。
关键词 地址匹配 地理编码 地址树模型 Lucene全文检索 地址分词 中文非标准地址 地址标准化 address matching geocoding address tree model Lucene full-text retrieval address segmentation Chinese non-standard address standardization of address
  • 相关文献

参考文献18

二级参考文献146

共引文献244

同被引文献34

引证文献4

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部