期刊文献+

半结构化数据的形式化描述及数据抽取方法研究 被引量:3

RESEARCH ON FORMAL DESCRIPTION OF SEMI-STRUCTURED DATA AND DATA EXTRACTION METHOD
下载PDF
导出
摘要 半结构化数据的形式化描述和信息抽取是解决用户查询和信息获取的核心问题。随着信息资源的多样化和快速膨胀,现有的描述和抽取方法存在召回率和查准率低等缺陷。为解决此问题,提出一种新的半结构数据形式化描述方法,重新定义领域概念集和领域知识集,并在此基础上给出领域概念集、领域知识集的构建过程,包括领域概念的自动抽取、领域知识集关系自动构建和相似度算法描述。实验结果表明,所提出的描述方法比现有方法具有更高召回率和查准率,具有很好的可行性和有效性。 Formal description and data extraction of semi-structured data are the core issues in solving user query and information access.Along with the information resources diversification and rapid expansion,existing description and extraction method have the defects in low recall rate and precision rate.In order to solve them,a new formal description method of semi-structure data is provided in this paper,the domain concept set and the domain knowledge set is redefined.Based on it,the construction process of domain concept set and domain knowledge set are given,including domain concept automatic extraction,domain knowledge sets automatic construction and the similarity algorithm description.Experimental results show that the proposed method has higher recall and precision than the existing method,and has very good feasibility and validity.
出处 《计算机应用与软件》 CSCD 北大核心 2013年第4期145-148,共4页 Computer Applications and Software
基金 河南省教育厅自然科学研究计划项目(2010C520007)
关键词 半结构数据 形式化描述 领域概念集 领域知识集 数据抽取 Semi-structured data Formal description Domain concept set Domain knowledge set Data extraction
  • 相关文献

参考文献4

二级参考文献40

  • 1周明建,高济,李飞.基于本体论的Web信息抽取[J].计算机辅助设计与图形学学报,2004,16(4):535-541. 被引量:34
  • 2许建潮,侯锟.Web信息的自主抽取方法[J].计算机工程与应用,2005,41(14):185-189. 被引量:15
  • 3孙霞,郑庆华,王朝静,张素娟.一种基于生语料的领域词典生成方法[J].小型微型计算机系统,2005,26(6):1088-1092. 被引量:11
  • 4刘耀,穗志方.领域Ontology概念描述体系构建方法探析[J].大学图书馆学报,2006,24(5):28-33. 被引量:15
  • 5杨敬伟,杨文柱,高悦.基于DOM的Web信息抽取规则的构造与实现[J].河北大学学报(自然科学版),2007,27(2):209-212. 被引量:5
  • 6M Blázquez,M Fernández,J M García-Pinar et al.Building Ontologies at the Knowledge Level using the Ontology Design Environment[C].In:Proceedings of KAW'98,1998:30~41 被引量:1
  • 7T Catarci,G Santucci,J Cardiff.Graphical interaction with heterogeneous databases[J].The VLDB Journal,1997;(6):97~120 被引量:1
  • 8Silvescu A,Reinoso-Castillo J,Honavar V.Ontology-driven Information Extraction and Knowledge Acquisition from Heterogeneous,Distributed Biological Data Sources[C/OL].In:Proccedings of the LJCAI-2001 Workshop on Knowledge Discovery from Heterogeneous,Distributed,Autonomous,Dynamic Data and Knowledge Sources,2001.[2008-11 -01].http://www.ca.iastate.edu/~honavar/Papers/ijcaiworkshop-paper.pdf. 被引量:1
  • 9Maedcbe A,Neumann G,Staab S.Bootstrapping an Ontologybased Inforrnation Extraction System[A]//Intelligent Exploration of the Web,Studies in Fuzziness and Soft -Computing[C].Heidelberg:Physica-Verlag Gmb H,2003:345-359. 被引量:1
  • 10Staab S,Madcbe A,Handschuh S.An Annotation Framework for the Semantic Web[C].In:Proceedings of the First International Workshop on Multi-Media Annotation,Tokyo,Japan,January 30-31.2001. 被引量:1

共引文献13

同被引文献16

引证文献3

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部