摘要
随着XML文档的广泛应用,使用实体识别技术对XML文档数据质量进行管理变得非常重要。XML中实体识别技术主要用于在XML文档中发现同一实体的不同描述,其在数据质量管理中可以用于错误检测、数据集成等。由于XML文档是半结构化的,XML文档上的实体识别与纯文本和关系数据上的实体识别有着很大不同。文中介绍了XML文档上实体识别的概念和应用,分别讨论了XML文档上几种实体识别技术的概念和原理,给出了相应的树匹配算法,最后得出结论并展望了未来的研究方向。
With the wide application of XML documents,it is important for applying entity recognition technology to the XML data quali-ty for management. Entity recognition is mainly applied to find different descriptions of the same entity in the XML document,which can be used for error detection,data integration in data quality management. Because XML documents is a semi-structured,entity identifica-tion is different from plain text and relation database in XML document. In this paper,introduce the concept and application of entity iden-tification of the XML document,and the concept and principle of several entity recognition technology are discussed,and the correspond-ing tree matching algorithm is given,finally discuss the prospect of future research directions.
出处
《计算机技术与发展》
2014年第10期84-87,共4页
Computer Technology and Development
基金
教育部人文社会科学研究一般项目(12YJC870030)
辽宁省教育科学"十二五"规划(JG12DB149)
辽宁省社会科学规划基金项目(L12CTQ008)
关键词
XML文档
实体识别
数据质量
XML documents
entity recognition
quality of data