期刊文献+

基于标记树的XML文档自动分类研究 被引量:5

XML Documents Classification Based on Labeled Tree
下载PDF
导出
摘要 本文首先介绍了XML文档和DTD标记树的生成方法,并对标记树中节点的概念进行了扩充,使之不但包括元素,同时也包括连接符,以适应DTD结构的要求。随后将标记树中的元素分为共有元素、文档元素和DTD元素,并提出层次权重和结构权重以衡量元素的层次和结构复杂程度,给出具体计算方法。在此基础上提出了一个衡量XML文档和DTD之间相似度的算法,将其应用于XML文档自动分类中,并给出该算法的时间复杂度计算公式。从实验结果可以看出,该分类方法准确率较高。 This paper introduces the method to generate labeled trees from XML documents and DTD, and expands the concept of node to make it suitable for both elements and operators of DTD. Then we divide the elements of labeled trees into three types: common elements, document elements and DTD elements. Level weight and structure weight are given to weigh the level of elements and the complexity of structures. Based on these studies, an algorithm is presented to compute the similarity between a XML document and a DTD, and is used in the classification of XML documents. From the results of our tests, this method has a better veracity.
作者 潘有能 丁楠
出处 《情报学报》 CSSCI 北大核心 2007年第3期350-355,共6页 Journal of the China Society for Scientific and Technical Information
基金 本文为浙江大学“曙光”青年项目(205000.362221)和浙江省教育厅资助项目(205204.F30501)研究成果.
关键词 文本分类 XML文档 层次权重 结构权重 text classification, XML documents, level weight, structure weight
  • 相关文献

参考文献7

  • 1潘有能.XML文档自动聚类研究[J].情报学报,2006,25(2):215-220. 被引量:16
  • 2郑仕辉,周傲英,张龙.XML文档的相似测度和结构索引研究[J].计算机学报,2003,26(9):1116-1122. 被引量:28
  • 3苏新宁等著..数据挖掘理论与技术[M].北京:科学技术文献出版社,2003:373.
  • 4Elisa Bertino,Giovanna Guerrini,Marco Mesiti,Luigi Tosetto.Evolving a set of DTDs according to a dynamic set of XML documents[C]∥Proceedings of the 8th International Conference on Extending Database Technology (EDBT 2002):45-66. 被引量:1
  • 5Yuan Wang,David J,DeWitt,Jin-Yi Cai.X-Diff:an effective change detection algorithm for XML documents[C]∥Proceedings of the 19th International Conference on Data Engineering (ICDE 2003):519-530. 被引量:1
  • 6Sigmod XML data sets[OL].[2006-08].http://www.acm.org/sigmod/record/xml. 被引量:1
  • 7Shakespeare XML data sets[OL].[2006-08].http://metalab.unc.edu/bosak/xml/eg. 被引量:1

二级参考文献28

  • 1潘有能,邓三鸿.基于XML和关联规则的Web挖掘研究[J].现代图书情报技术,2004(7):30-34. 被引量:9
  • 2XQuery: A query language for XML. W3C Working Draft 15February 2001, available: http://www. w3. org/TR/xquery/. 被引量:1
  • 3Tarjan. Three partition refinement algorithms. SIAM Journalon Computing, 1987, 16(6): 973-989. 被引量:1
  • 4Henzinger M R, Henzinger T A, Kopke P W. Computing sim-ulations on finite and infinite graphs. In: Proceedings of the36th Annual IEEE Symposium on Foundations of ComputerScience, Milwaukee, Wisconsin, 1995. 453-462. 被引量:1
  • 5Marian A, Abiteboul S, Cobena G, Mignet L. Change-centricmanagement of versions in an XML warehouse. In: Proceed-ings of the 27th International Conference on Very Large DataBases, Roma, Italy,2001. 581-590. 被引量:1
  • 6Goldman R, Widom J. Summarizing and searching sequential semistructured sources. Stanford University: Technical ReportTR20000312, 2000. 被引量:1
  • 7Zheng Shi-Hui, Zhou Ao-Ying et al. Structure-based approximate searching in XML data. Fudan University: Technical Report TR20010203,2001. 被引量:1
  • 8Wang J T-L, Shasha D etal. Structural matching and discovery in document databases. Sigmod Record, 1997, 26(2): 560-564. 被引量:1
  • 9Zhang K. A constrained editing distance between unordered labeled trees. Journal of Algorithmica, 1996, 15(3): 205-222. 被引量:1
  • 10Zhang K, Shasha D. On the editing distance between unordered labeled trees. Information Processing Letters, 1992, 42(3): 133-139. 被引量:1

共引文献41

同被引文献191

引证文献5

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部