基于聚类和辅助词典的模式匹配方法被引量：1

A schema matching approach based on clustering and auxiliary dictionary

下载PDF

导出

摘要针对中文环境下的模式冲突问题,提出了一种利用元数据的模式匹配方法.该方法从数据字典中为模式提取特征向量,并采用聚类技术对其进行聚类,将语义相近的模式划分到相同聚簇中;对于同一聚簇中的不同模式,借助辅助词典计算属性间的语义相似度,并采用多种选择策略相结合的方法对结果进行过滤,为每个属性生成候选匹配集合.实验结果表明,该方法不仅可以提高模式匹配效率,而且具有较高的准确度. For the problem of schema conflict in Chinese environment, a novel metadata-based schema matching method was proposed. Firstly, a feature vector was extracted for each schema from database dictionary, and the clustering technique was performed on the vectors, then the similar schemas in semantics were divided into the same clusters. Secondly, for different schemas in the same cluster, the semantic similarities between attributes were calculated, with the help of auxiliary dictionary. Finally, a method combing a variety of strategies was used to filter the results, and the candidate matching set for each attribute was generated. The experimental results show that the proposed method can not only increase the efficiency of schema matching, but also have a higher accuracy.

作者刘国峰黄少滨程媛郎大鹏

机构地区哈尔滨工程大学计算机科学与技术学院

出处《哈尔滨工程大学学报》 EI CAS CSCD 北大核心 2013年第2期214-220,共7页 Journal of Harbin Engineering University

基金国家科技支撑计划项目(2009BAH42B02) 国家自然科学基金项目(60873038 60903080) 哈尔滨工程大学中央高校基本科研业务专项资金项目(100603)

关键词模式匹配聚类技术辅助词典语义相似度 schema matching clustering technique auxiliary dictionary semantic similarity

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献12

1BERNSTEIN P A, JAYANT M, RAHM E. Generic schema matching, ten years later [ C ] /! Proceedings of the VLDB Endowment. [ s. 1. ] , 2011, 4( 11 ) :695-701. 被引量：1
2ZHAO Huimin. Semantic matching across heterogeneous da- ta sources [ J]. Communications of the ACM, 2007, 50 ( 1 ) :45-50. 被引量：1
3申德荣,余恩运,张旭,寇月,聂铁铮,于戈.SKM:一种基于模式结构和已有匹配知识的模式匹配模型[J].软件学报,2009,20(2):327-338. 被引量：9
4DOAN A, DOMINGOS P, HALEVY A. Reconciling sche- mas of disparate data sources: a machine-learning approach [ C] //Proceedings of the 2001 ACM SIGMOD Internation- al Conference on Management of Data. Santa Barbara, USA, 2001:509-520. 被引量：1
5DHAMANKAR R, YOONKYONG L, DOAN A, et al. iMap : Discovering complex semantic matches between data-base schemas [ C ] // Proceedings of the 2004 ACM SIG- MOD International Conference on Management of Data. Paris, 2004, (6) :383-394. 被引量：1
6MADHAVAN J, BERNSTEIN P, RAHM E. Generic sche- ma matching with Cupid[ C] // Proceedings of the 27th In- ternational Conference on Very Large Data Bases. San Fran- sisco,USA, 2001:49-58. 被引量：1
7MELNIK S, GARCIA-MOLINA H, RAHM E. Similarity flooding : a versatile graph matching algorithm and its appli- cation to schema matching [ C ] // Proceedings of Interna- tional Conference on Data Engineering. San Jose, USA, 2002 : 117-128. 被引量：1
8李国徽,杜小坤,杜建强.基于部分函数依赖的结构匹配方法[J].计算机学报,2010,33(2):240-250. 被引量：10
9] ZHA0 Huimin, RAM S. Clustering schema elements for se- mantic integration of heterogeneous data sources [ J ]. Jour- nal of Database Management, 2004, 15 ( 4 ) : 88-106. 被引量：1
10PEI J, HONG J, BELL D. A novel clustering-based ap- proach to schema matching [ C ] /! 4th International Con- ference on Advances in Information Systems. [ s. 1. ] :Tur- key:Springer-Verlag, 2006, 4243:60-69. 被引量：1

二级参考文献28

1Rahm E, Bernstein PA. A survey of approaches to automatic schema mathcing. VLDB Journal, 2001,10(4):334-350. 被引量：1
2Madhavan J, Bernstein PA. Rahm E. Generic schema matching with cupid. In: Apers PMG, Atzeni P, eds. Proc. of the 27th Int'l Conf. on Very Large Data Bases. San Fransisco: Morgan Kaufmann Publishers, 2001.48-58. 被引量：1
3Do HH, Rahm E. COMA-A system for flexible combination of schema matching approaches. In: Bernstein PA, Loannnidis YE, eds. Proc. of the 28th Int'l Conf. on Very Large Data Bases. San Fransisco: Morgan Kaufmann Publishers, 2002.610-621. 被引量：1
4Melnik S, Molina HG, Rahm E. Similarity flooding: A versatile graph matching algorithm. In: Liu L, Reuter A, Whang KY, Zhang J J, eds. Proc. of the 18th Int'l Conf. on Data Engineering. Los Alamitos: IEEE Computer Society, 2002. 117-128. 被引量：1
5Madhavan J, Berastein PA, Doan A, Halevy A. Corpus-Based schema matching. In: Kitagawa H, Ishikawa Y, eds. Proc. of the 18th Int'l Conf. on Data Engineering. Los Alamitos: IEEE Computer Society, 2005.57-68. 被引量：1
6He B, Chang KCC, Han J. Discovering complex matehings across Web query interfaces: A correlation mining approach. In: Won K, Ron K, Johannes G, William D, eds. Proc. of the 10th Int'l Conf. on Knowledge Discovery and Data Mining. New York: ACM Press, 2004. 148-157. 被引量：1
7Bilke A, Naumann F. Schema matching using duplicates. In: Kitagawa H, Ishikawa Y, eds. Proc. of the 18th Int'l Conf. on Data Engineering. Los Alamitos: IEEE Computer Society, 2005.69-80. 被引量：1
8Doan A, Madhavan J, Dhamankar R, Halevy A. Learning to map ontologies on the semantic Web. In: Lawrence S, ed. Proc. of the World-Wide Web Conf. New York: ACM Press, 2002. 662-673. 被引量：1
9Doan A, Domingos P, Halvey A. Reconciling schemas of disparate data sources: A machine-learning approach. In: Aref WG, ed. Proc. of the 2001 SIGMOD Int'l Conf. on Management of Data. New York: ACM Press, 2001. 509-520. 被引量：1
10Miller RJ, Hernandez MA, Haas LM, Yan L, Ho CTH, Fagin R, Popa L. The Clio project: Managing heterogeneity. ACM SIGMOD Record, 2001,30(1):78-83. 被引量：1

共引文献116

1王凯,周建国,夏德麟,晏蒲柳,董伟钛.基于支持向量机的中文文本自动分类研究[J].计算机应用研究,2005,22(11):61-63. 被引量：3
2钱兵,王永成,高凯.面向搜索引擎的自然语言理解的设计与实现[J].计算机应用研究,2006,23(12):260-262. 被引量：9
3秦春秀,赵捧未,刘怀亮.词语相似度计算研究[J].情报理论与实践,2007,30(1):105-108. 被引量：30
4商鹏,王晓琳.基于用户上下文的新闻服务机制研究[J].计算机工程与设计,2007,28(4):955-958.
5张映海,何中市,陈永锋.搜索引擎结果中Web文档的排序研究[J].计算机与数字工程,2007,35(2):126-129. 被引量：2
6徐德智,王怀民.基于本体的概念间语义相似度计算方法研究[J].计算机工程与应用,2007,43(8):154-156. 被引量：34
7徐德智,C.Onyango,王怀民.上位本体中语义相似度的计算及其实现[J].计算技术与自动化,2007,26(2):50-52.
8夏天.汉语词语语义相似度计算研究[J].计算机工程,2007,33(6):191-194. 被引量：63
9许可,迟名远,王成友,蔡宣平.基于语料库相似度的语料选择[J].计算机工程,2007,33(17):231-233.
10王广正,王喜凤.基于知网语义相关度计算的词义消歧方法[J].安徽工业大学学报（自然科学版）,2008,25(1):71-75. 被引量：10

同被引文献5

1王瑞莹,邱亮.一种新的应用于数据流关联分析的多模式匹配算法[J].东北电力大学学报,2012,32(4):22-25. 被引量：1
2胡文彬,潘祝山,纪兆辉.模式匹配不确定性的多因素集结度量[J].智能系统学报,2015,10(2):286-292. 被引量：1
3范红杰,柳军飞,周鲁东,麻志毅.多策略相似度整合的XML模式匹配方法[J].计算机科学与探索,2016,10(1):14-24. 被引量：1
4郭帅,郭忠文,仇志金.HSMA:面向物联网异构数据的模式分层匹配算法[J].计算机研究与发展,2018,55(11):2522-2531. 被引量：8
5王丰,王亚沙,赵俊峰,崔达.一种基于迭代的关系模型到本体模型的模式匹配方法[J].软件学报,2019,30(5):1510-1521. 被引量：7

引证文献1

1常伟鹏,袁泉.融合多模式匹配的网络信息实体关联研究仿真[J].计算机仿真,2021,38(1):331-335.

1李永华,张林,赵玉霞.一种改进的非嵌入式水印算法[J].电脑知识与技术,2012,8(12):8254-8255.
2翟俊海,王熙照,张素芳.信息粒度、信息熵与决策树[J].计算机工程与应用,2009,45(12):126-128. 被引量：5
3王志国.在．NET环境下用Treeview遍历活动目录[J].电脑编程技巧与维护,2008(2):13-15. 被引量：2
4陈祖琴,向李娟,葛继科.基于QoS预测的网络服务推荐系统的设计与实现[J].情报探索,2011(4):100-103.
5朱俊,侯整风,曹亚群.模式匹配算法在IDS中的应用[J].合肥学院学报（自然科学版）,2010,20(3):52-55. 被引量：3
6何险峰,黄迪明,黄羽,刘家芬.基于Agent的入侵检测系统体系结构设计[J].计算机应用,2003,23(9):42-44. 被引量：3
7张波云,殷建平,张鼎兴,蒿敬波,王树林.基于集成神经网络的计算机病毒检测方法[J].计算机工程与应用,2007,43(13):26-29. 被引量：6
8杨文君,魏占国,王玉平.入侵检测系统中高效的模式匹配算法[J].小型微型计算机系统,2009,30(11):2189-2194. 被引量：3
9吕萍,钱进,王波.基于知识粒度的最小属性约简算法[J].江苏理工学院学报,2008,14(2):22-26.
10李红娟.云计算关键技术理论分析与探讨[J].电子世界,2013(23):4-5. 被引量：1

哈尔滨工程大学学报

2013年第2期

浏览历史

内容加载中请稍等...

基于聚类和辅助词典的模式匹配方法被引量：1

参考文献12

二级参考文献28

共引文献116

同被引文献5

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于聚类和辅助词典的模式匹配方法 被引量：1

参考文献12

二级参考文献28

共引文献116

同被引文献5

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于聚类和辅助词典的模式匹配方法被引量：1