期刊文献+

基于加权余弦相似度的XML文档聚类研究 被引量:10

XML Domument Clustering Research Based on Weighted Cosine Similarity
下载PDF
导出
摘要 在实际应用中,XML(eXtensible Markup Language)文档中的一些结构经常被改变。为了挖掘XML文档在历史变化过程中经常改变的结构所蕴含的知识,提出了发现频繁变化结构的方法。该方法用一组频繁变化结构组成的文档向量模型代表一个XML文档,将频繁变化结构在簇中的出现比例作为权值,使用加权余弦相似度对XML文档进行聚类。经过实验分析,根据XML文档历史变化过程中的频繁变化结构能较好地将XML文档进行聚类。用加权余弦相似度对XML文档进行聚类,其聚类结果的正确率、召回率和簇内部距离均优于使用非加权余弦相似度对XML文档进行聚类得到的结果。 In practical applications,some structures of an XML (eXtensible Markup Language) document are often changed. In order to mining knowledge hiden in the freduently changing structures in the XML document history changes,a method to found the frequently changing structures is proposed,then uses a document-vector model which composition by a set of frequently changing structures to represent an XML document,to proportion that frequently changing structures appearance in the cluster as weight,and cluster XML documents use weighted cosine similarity. After experimental analysis,according to frequently changing structures which found in the XML document historical change process will be better able to cluster XML documents. Cluster XML document using the weighted cosine similarity,the precision rate,recall rate and cluster internal distance of cluster result are all better than the results obtained by use non-weighted cosine similarity.
出处 《吉林大学学报(信息科学版)》 CAS 2010年第1期68-76,共9页 Journal of Jilin University(Information Science Edition)
基金 吉林省科技发展计划基金资助项目(20090704)
关键词 XML文档聚类 加权余弦相似度 频繁变化结构 XML document clustering weighted cosine similarity frequently changing structures
  • 相关文献

参考文献11

  • 1BUCHNER A G,MULVENNA M D,ANAND S S,et al.Data Mining and XML:Current and Future Issues[C]∥Web Information Systems Engineering.Washington:IEEE Computer Society,2000:131-135. 被引量:1
  • 2ALEXANDRE TERMIER,MARIE-CHRISTINE ROUSSET,MICHELE SEBAG.Tree Finder:A First Step Towards XML Data Mining[C]∥Proceedings of the 2002 IEEE International Conference on Data Mining.Maebashi City,Japan:IEEE Computer Society,2002:450-457. 被引量:1
  • 3ZHAO Qian-kun,CHENG Ling,BHOWMICK SOURAV S,et al.XML Structural Delta Mining:Issues and Challenges[J].Data and Knowledge Engineering Journal,2006,59(3):627-651. 被引量:1
  • 4ZHAO Qian-kun,BHOWMICK SOURAV S,MOHANIA MUKESH K,et al.Discovering Frequently Changing Structures from Historical Structural Deltas of Unordered XML[C]∥Proceedings of the CIKM.New York,USA:ACM,2004:188-197. 被引量:1
  • 5FLAVIO RIZZOLO,ALEJANDRO A VAISMAN.Temporal XML:Modeling,Indexing,and Query Processing[J].The VLDB Journal,2008,17:1179-1212. 被引量:1
  • 6徐沛娟,李雄飞,惠玥,张桂林.中文文本分类相关算法的研究与实现[J].吉林大学学报(理学版),2009,47(4):790-794. 被引量:12
  • 7TAY M,SUN Y,LIU D,et al.Mapping XML Data to Relational Data:A DOM-based Approach[C] ∥Internet and Multimedia Systems and Applications.Washington,DC USA:IEEE,2004:426-431. 被引量:1
  • 8NIERMAN A,JAGADISH H V.Evaluating Structural Similarity in XML Documents[C]∥ Proceedings of the WebDB Workshop.Madison Wisconsin,USA:EECS,2002:61-66. 被引量:1
  • 9WANG L,CHEUNG D W,MAMOULIS N,et al.An Efficient and Scalable Algorithm for Clustering XML Documents by Structure[J].IEEE Transactions on Knowledge and Data Engineering,2004,16(1):82-96. 被引量:1
  • 10LEUNG Ho-pong,CHUNG Fu-lai,STEPHEN C F CHAN,et al.XML Document Clustering Using Common XPath[C]//Web Information Retrieval and Integration.Washington,DC,USA:IEEE Computer Society,2005:91-96. 被引量:1

二级参考文献10

共引文献11

同被引文献60

引证文献10

二级引证文献16

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部