摘要
针对Web信息抽取(WIE)技术在健康领域应用的问题,提出了一种基于Web Harvest的健康领域Web信息抽取方法。通过对不同健康网站的结构分析设计健康实体的抽取规则,实现了基于Web Harvest的自动抽取健康实体及其属性的算法;再把抽取的实体及其属性进行一致性检查后存入关系数据库中,然后对关系数据库中隐含健康实体的属性值利用Ansj自然语言处理方法进行实体识别,进而抽取健康实体之间的联系。该技术在健康实体抽取实验中,平均F值达到99.9%,在实体联系抽取实验中,平均F值达到80.51%。实验结果表明提出的Web信息抽取技术在健康领域抽取的健康信息具有较高的质量和可信性。
For the question how to apply the Web Information Extraction( WIE) technology to health field, a Web information extraction method based on Web Harvest was proposed. Through the structure analysis of different health Web sites and the design of health entity extraction rules, the automatic extraction algorithm of health entity and its attributes based on Web Harvest was realized; then they were stored in a relational database after consistency check; in the end, the values of entity attributes were analyzed to recognize entities by using processing method of natural language Ansj to extract relationship among entities. In the health entity extraction experiments, the average F-measure of the technology reached 99. 9%; in the entity contact extraction experiments, the average F-measure reached 80. 51%. The experimental results show that the proposed Web information extraction technology has high quality and credibility in the health information extraction.
出处
《计算机应用》
CSCD
北大核心
2016年第1期163-170,共8页
journal of Computer Applications
基金
国家自然科学基金资助项目(61073057)~~
关键词
信息抽取
健康信息抽取
一致性检查
实体识别
实体联系抽取
information extraction
health information extraction
consistency check
entity recognition
entity relationship extraction