摘要
中文实体间的数值型关系抽取有着广泛的应用前景,目前常用的实体关系抽取一般采用有监督抽取方法,且多用于短文本和简单句,并不适合处理海量复杂句.针对来自于网络的大量复杂文本,本文提出了一种中文实体数值型关系的无监督抽取方法.在中文分词、词性标注等自然语言处理结果的基础上,首先经过句式分析并采用选择树算法构建候选集,接着利用Jaro-Winkler距离进行候选集筛选,最后抽取得到数值型三元组关系.本文在钢铁、船舶、房地产3个行业的数据上进行了实验,结果表明,该方法抽取中文实体数值型关系是有效的.
Attribute value entity relation extraction has great potential for broader areas of application.The mainstream methods of relation extraction from Chinese texts are with the nature of supervision,and only the short and simple sentences in the text have been taken into consideration,so they are not suitable for the massive texts and the complex sentences.This paper proposes a method to extract the attribute-value relation triple from the Chinese texts.On the basis of Chinese word segmentation and part of speech tagging,the selection tree algorithm is utilized to construct the candidate set for the complex sentences firstly,and then the Jaro-Winkler distance is used to filter the relation triples.The experimental results demonstrate the effectiveness and feasibility of our method in attribute-value entity relation extraction from three Chinese datasets of iron and steel,ship manufacturing and real estate.
作者
吴胜
刘茂福
胡慧君
张志清
顾进广
WU Sheng LIU Maofu HU Huijun ZHANG Zhiqing GU Jinguang(College of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430065, Hubei, China Hubei Province Key Laboratory of Intelligent In{ormation Processing and Real-Time Industrial System, Wuhan 430065, Hubei, China School of Management, Wuhan University of Science and Technology, Wuhan 430081, Hubei, China)
出处
《武汉大学学报(理学版)》
CAS
CSCD
北大核心
2016年第6期552-560,共9页
Journal of Wuhan University:Natural Science Edition
基金
国家社会科学基金重大项目(11&ZD189)
湖北省自然科学基金面上项目(2015CFB564)
湖北省教育厅科学技术研究计划指导性项目(B2016010)资助
关键词
实体关系抽取
无监督
数值型三元组
信息抽取
entity relation extraction
unsupervised
attribute-value relation triple
information extraction