期刊文献+

基于聚类和辅助词典的模式匹配方法 被引量:1

A schema matching approach based on clustering and auxiliary dictionary
下载PDF
导出
摘要 针对中文环境下的模式冲突问题,提出了一种利用元数据的模式匹配方法.该方法从数据字典中为模式提取特征向量,并采用聚类技术对其进行聚类,将语义相近的模式划分到相同聚簇中;对于同一聚簇中的不同模式,借助辅助词典计算属性间的语义相似度,并采用多种选择策略相结合的方法对结果进行过滤,为每个属性生成候选匹配集合.实验结果表明,该方法不仅可以提高模式匹配效率,而且具有较高的准确度. For the problem of schema conflict in Chinese environment, a novel metadata-based schema matching method was proposed. Firstly, a feature vector was extracted for each schema from database dictionary, and the clustering technique was performed on the vectors, then the similar schemas in semantics were divided into the same clusters. Secondly, for different schemas in the same cluster, the semantic similarities between attributes were calculated, with the help of auxiliary dictionary. Finally, a method combing a variety of strategies was used to filter the results, and the candidate matching set for each attribute was generated. The experimental results show that the proposed method can not only increase the efficiency of schema matching, but also have a higher accuracy.
出处 《哈尔滨工程大学学报》 EI CAS CSCD 北大核心 2013年第2期214-220,共7页 Journal of Harbin Engineering University
基金 国家科技支撑计划项目(2009BAH42B02) 国家自然科学基金项目(60873038 60903080) 哈尔滨工程大学中央高校基本科研业务专项资金项目(100603)
关键词 模式匹配 聚类技术 辅助词典 语义相似度 schema matching clustering technique auxiliary dictionary semantic similarity
  • 相关文献

参考文献12

  • 1BERNSTEIN P A, JAYANT M, RAHM E. Generic schema matching, ten years later [ C ] /! Proceedings of the VLDB Endowment. [ s. 1. ] , 2011, 4( 11 ) :695-701. 被引量:1
  • 2ZHAO Huimin. Semantic matching across heterogeneous da- ta sources [ J]. Communications of the ACM, 2007, 50 ( 1 ) :45-50. 被引量:1
  • 3申德荣,余恩运,张旭,寇月,聂铁铮,于戈.SKM:一种基于模式结构和已有匹配知识的模式匹配模型[J].软件学报,2009,20(2):327-338. 被引量:9
  • 4DOAN A, DOMINGOS P, HALEVY A. Reconciling sche- mas of disparate data sources: a machine-learning approach [ C] //Proceedings of the 2001 ACM SIGMOD Internation- al Conference on Management of Data. Santa Barbara, USA, 2001:509-520. 被引量:1
  • 5DHAMANKAR R, YOONKYONG L, DOAN A, et al. iMap : Discovering complex semantic matches between data-base schemas [ C ] // Proceedings of the 2004 ACM SIG- MOD International Conference on Management of Data. Paris, 2004, (6) :383-394. 被引量:1
  • 6MADHAVAN J, BERNSTEIN P, RAHM E. Generic sche- ma matching with Cupid[ C] // Proceedings of the 27th In- ternational Conference on Very Large Data Bases. San Fran- sisco,USA, 2001:49-58. 被引量:1
  • 7MELNIK S, GARCIA-MOLINA H, RAHM E. Similarity flooding : a versatile graph matching algorithm and its appli- cation to schema matching [ C ] // Proceedings of Interna- tional Conference on Data Engineering. San Jose, USA, 2002 : 117-128. 被引量:1
  • 8李国徽,杜小坤,杜建强.基于部分函数依赖的结构匹配方法[J].计算机学报,2010,33(2):240-250. 被引量:10
  • 9] ZHA0 Huimin, RAM S. Clustering schema elements for se- mantic integration of heterogeneous data sources [ J ]. Jour- nal of Database Management, 2004, 15 ( 4 ) : 88-106. 被引量:1
  • 10PEI J, HONG J, BELL D. A novel clustering-based ap- proach to schema matching [ C ] /! 4th International Con- ference on Advances in Information Systems. [ s. 1. ] :Tur- key:Springer-Verlag, 2006, 4243:60-69. 被引量:1

二级参考文献28

  • 1Rahm E, Bernstein PA. A survey of approaches to automatic schema mathcing. VLDB Journal, 2001,10(4):334-350. 被引量:1
  • 2Madhavan J, Bernstein PA. Rahm E. Generic schema matching with cupid. In: Apers PMG, Atzeni P, eds. Proc. of the 27th Int'l Conf. on Very Large Data Bases. San Fransisco: Morgan Kaufmann Publishers, 2001.48-58. 被引量:1
  • 3Do HH, Rahm E. COMA-A system for flexible combination of schema matching approaches. In: Bernstein PA, Loannnidis YE, eds. Proc. of the 28th Int'l Conf. on Very Large Data Bases. San Fransisco: Morgan Kaufmann Publishers, 2002.610-621. 被引量:1
  • 4Melnik S, Molina HG, Rahm E. Similarity flooding: A versatile graph matching algorithm. In: Liu L, Reuter A, Whang KY, Zhang J J, eds. Proc. of the 18th Int'l Conf. on Data Engineering. Los Alamitos: IEEE Computer Society, 2002. 117-128. 被引量:1
  • 5Madhavan J, Berastein PA, Doan A, Halevy A. Corpus-Based schema matching. In: Kitagawa H, Ishikawa Y, eds. Proc. of the 18th Int'l Conf. on Data Engineering. Los Alamitos: IEEE Computer Society, 2005.57-68. 被引量:1
  • 6He B, Chang KCC, Han J. Discovering complex matehings across Web query interfaces: A correlation mining approach. In: Won K, Ron K, Johannes G, William D, eds. Proc. of the 10th Int'l Conf. on Knowledge Discovery and Data Mining. New York: ACM Press, 2004. 148-157. 被引量:1
  • 7Bilke A, Naumann F. Schema matching using duplicates. In: Kitagawa H, Ishikawa Y, eds. Proc. of the 18th Int'l Conf. on Data Engineering. Los Alamitos: IEEE Computer Society, 2005.69-80. 被引量:1
  • 8Doan A, Madhavan J, Dhamankar R, Halevy A. Learning to map ontologies on the semantic Web. In: Lawrence S, ed. Proc. of the World-Wide Web Conf. New York: ACM Press, 2002. 662-673. 被引量:1
  • 9Doan A, Domingos P, Halvey A. Reconciling schemas of disparate data sources: A machine-learning approach. In: Aref WG, ed. Proc. of the 2001 SIGMOD Int'l Conf. on Management of Data. New York: ACM Press, 2001. 509-520. 被引量:1
  • 10Miller RJ, Hernandez MA, Haas LM, Yan L, Ho CTH, Fagin R, Popa L. The Clio project: Managing heterogeneity. ACM SIGMOD Record, 2001,30(1):78-83. 被引量:1

共引文献116

同被引文献5

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部