期刊文献+

一种从XML数据中发现关系信息的方法 被引量:10

A Method of Discovering Relation Information from XML Data
下载PDF
导出
摘要 提出了一种发现蕴藏在不同XML文档嵌套结构中的关系信息及其出现模式的新方法.可根据用户兴趣,发现描述不同实体之间联系的关系信息,抽取关系实例及其在文档中的出现模式.具体解决方案是:首先识别和收集包含用户感兴趣的实体的XML文档片段:然后根据文档片段标签的语义和文档片段的结构计算文档片段的相似度,并采用自适应阈值方法按相似度聚类文档片段.使得包含同一种关系的文档片段聚集在同一个片段簇:最后从XML文档片段簇中抽取关系实例及其出现模式.实验结果表明,对于包含有意义标签的各种XML文档,该方法能够准确地识别和抽取出描述指定实体之间联系的各种关系信息. A novel method of discovering relation information among entities buried in different nest structures of XML documents is proposed. The method is able to identify relations among different types of entities given by users, and extract relation instances and their occurrence patterns in XML documents. The solution is as follows: identify and collect XML fragments that contain all types of entity given by users at first, then calculate similarity between fragments based on semantics of their tags and their structures, and cluster fragments with a adaptively selected similarity threshold so that the fragments containing the same relation are clustered together, finally extract relation instances and patterns of their occurrences from each cluster. The experimental results show that the method can identify and extract relation information among given types of entities correctly from all kinds of XML documents with meaningful tags.
出处 《软件学报》 EI CSCD 北大核心 2008年第6期1422-1427,共6页 Journal of Software
基金 Supported by the Natural Science Foundation of Fujian Province of China under Grant No.A0510020(福建省自然科学基金) the Int'I Science and Technology Cooperation Project of Fujian Province of China under Grant No.20041014(福建省国际科技合作项目)
关键词 关系信息 XML文档 相似度 聚类 出现模式 relation information XML document similarity cluster occurrence pattern
  • 相关文献

参考文献7

  • 1Chang CH, Kayed M, Girgis MR, Shaalan KF. A survey of Web information extraction systems. IEEE Trans. on Knowledge and Data Engineering, 2006,18( 10): 1411 - 1428. 被引量:1
  • 2Brin S. Extracting patterns and relations from the world wide Web. In: Atzeni P, Mendelzon AO, Mecca G, eds. Proc. of the World Wide Web and Databases, Int'l Workshop WebDB'98. Valencia: Springer-Verlag, 1998. 172-183. 被引量:1
  • 3Sundaresan N, Yi JH. Mining the Web for relations. Computer Networks: The Int'l Journal of Computer and Telecommunications Networking, 2000,33(6):699-711. 被引量:1
  • 4Lin DK. An information-theoretic definition of similarity. In: Shavlik J, ed. Proc. of the 15th Int'l Conf. on Machine Learning. San Francisco: Morgan Kaufmann Publishers, 1998. 296-304. 被引量:1
  • 5Lee JW, Lee KH, Kim W. Preparations for semantics-based XML mining. In: Cercone N, Lin TY, Wu XD, eds. Proc. of the 2001 IEEE Int'l Conf. on Data Mining. Washington: IEEE Computer Society, 2001. 345-352. 被引量:1
  • 6Han JW, Kamber M. Data Mining Concepts and Techniques. New York: Morgan Kaufmann Publishers, 2000. 363-369. 被引量:1
  • 7Query engine, http://www.cs.wisc.edu/niagara/data.html 被引量:1

同被引文献63

引证文献10

二级引证文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部