摘要
针对网页文本蕴含着丰富的地名地址空间信息,但因其描述的随机性、多样性,导致信息很难被快速、准确地识别出来的问题。该文在分析网页文本中地名地址组成特点的基础上,考虑地名地址的事件属性,提出了一种基于"地名地址基因"的信息提取方法,依据事件相关度、地名地址的字符长度等提取因子建立提取规则树获取目标地名地址。实际数据测试表明该方法在地名地址提取上更具针对性,提高了效率和准确率。
Aiming at the problem that web text contains a wealth of address space information,but it is difficult to identify and extract because the address are described randomly and diversely.This paper presented a new method for the address extraction based on the the place name and address genes library after analyzing the characteristics of them.In this paper,a extraction rule tree was established according to event attributes,character length and word frequency of the address.The actual data tests showed that the method was more specific,and the efficiency and accuracy were improved.
作者
杜中波
刘新
宋婷婷
梁冰
周新宇
DU Zhongbo;LIU Xin;SONG Tingting;LIANG Bing;ZHOU Xinyu(College of Geomatics,Shandong University of Science and Technology,Qingdao,Shandong 266590,China;Key Laboratory of Fundamental Geographic Information and Digital Technology of Shandong Province, Shandong University of Science and Technology,Qingdao,Shandong 266590,China;Chinese Academy of Surveying and Mapping, Beijing 100036, China;Urban Planning Management Information Center of Beijing Xicheng District, Beijing 100035, China)
出处
《测绘科学》
CSCD
北大核心
2019年第4期196-202,共7页
Science of Surveying and Mapping
基金
测绘地理信息公益性行业科研专项(201512020)
中国测绘科学研究院基本科研业务费项目(7771607)
西城区科技项目(SD2015-25)
关键词
地名地址基因
网页信息
事件属性
规则树
place name and address gene
web page information
event attributes
rule tree