摘要
针对中文环境下的模式冲突问题,提出了一种利用元数据的模式匹配方法.该方法从数据字典中为模式提取特征向量,并采用聚类技术对其进行聚类,将语义相近的模式划分到相同聚簇中;对于同一聚簇中的不同模式,借助辅助词典计算属性间的语义相似度,并采用多种选择策略相结合的方法对结果进行过滤,为每个属性生成候选匹配集合.实验结果表明,该方法不仅可以提高模式匹配效率,而且具有较高的准确度.
For the problem of schema conflict in Chinese environment, a novel metadata-based schema matching method was proposed. Firstly, a feature vector was extracted for each schema from database dictionary, and the clustering technique was performed on the vectors, then the similar schemas in semantics were divided into the same clusters. Secondly, for different schemas in the same cluster, the semantic similarities between attributes were calculated, with the help of auxiliary dictionary. Finally, a method combing a variety of strategies was used to filter the results, and the candidate matching set for each attribute was generated. The experimental results show that the proposed method can not only increase the efficiency of schema matching, but also have a higher accuracy.
出处
《哈尔滨工程大学学报》
EI
CAS
CSCD
北大核心
2013年第2期214-220,共7页
Journal of Harbin Engineering University
基金
国家科技支撑计划项目(2009BAH42B02)
国家自然科学基金项目(60873038
60903080)
哈尔滨工程大学中央高校基本科研业务专项资金项目(100603)
关键词
模式匹配
聚类技术
辅助词典
语义相似度
schema matching
clustering technique
auxiliary dictionary
semantic similarity