摘要
半结构化数据的形式化描述和信息抽取是解决用户查询和信息获取的核心问题。随着信息资源的多样化和快速膨胀,现有的描述和抽取方法存在召回率和查准率低等缺陷。为解决此问题,提出一种新的半结构数据形式化描述方法,重新定义领域概念集和领域知识集,并在此基础上给出领域概念集、领域知识集的构建过程,包括领域概念的自动抽取、领域知识集关系自动构建和相似度算法描述。实验结果表明,所提出的描述方法比现有方法具有更高召回率和查准率,具有很好的可行性和有效性。
Formal description and data extraction of semi-structured data are the core issues in solving user query and information access.Along with the information resources diversification and rapid expansion,existing description and extraction method have the defects in low recall rate and precision rate.In order to solve them,a new formal description method of semi-structure data is provided in this paper,the domain concept set and the domain knowledge set is redefined.Based on it,the construction process of domain concept set and domain knowledge set are given,including domain concept automatic extraction,domain knowledge sets automatic construction and the similarity algorithm description.Experimental results show that the proposed method has higher recall and precision than the existing method,and has very good feasibility and validity.
出处
《计算机应用与软件》
CSCD
北大核心
2013年第4期145-148,共4页
Computer Applications and Software
基金
河南省教育厅自然科学研究计划项目(2010C520007)
关键词
半结构数据
形式化描述
领域概念集
领域知识集
数据抽取
Semi-structured data Formal description Domain concept set Domain knowledge set Data extraction