期刊文献+

应用于动态异构web服务器的相似度求解方法 被引量:10

Similarity calculation method applied to dynamic heterogeneous web server system
下载PDF
导出
摘要 针对经典的基于编辑距离的字符串相似度计算方法计算效率低且准确率差的不足,提出一种基于编辑距离和最长公共子串的改进字符串相似度求解方法,引入最长公共前缀和最长公共后缀,定义新的相似度计算公式。将该方法应用于基于异构平台的动态异构web服务系统模型,通过网页篡改检测实验验证,与经典算法和经典公式相比,改进的相似度计算方法能够在适应自身差异性的基础上,提高相似度计算的准确性和计算效率。 To solve the problem of low computational efficiency and poor accuracy of classical string similarity calculation method based on edit distance,an improved string similarity calculation method based on the edit distance and the longest common substring whose calculation formula was defined by introducing the longest common prefix and the longest common suffix was proposed.This method was applied to the dynamic heterogeneous Web server system model based on heterogeneous platform.Through the tamper detection experiment,results show that,compared with classical string similarity calculation method,the improved string similarity calculation method can not only adapt itself to the heterogeneous but also be used to improve the accuracy and the efficiency of the similarity calculation.
出处 《计算机工程与设计》 北大核心 2018年第1期282-287,共6页 Computer Engineering and Design
基金 国家重点研发计划基金项目(2016YFB0800104) 上海科学技术委员会科研计划基金项目(14DZ1105300)
关键词 编辑距离 相似度 动态性 异构性 网页防篡改 edit distance string similarity dynamic heterogeneous webpages temper-proofing
  • 相关文献

参考文献8

二级参考文献71

  • 1杨宗长.Windows下健壮的随机数发生器设计[J].工程地质计算机应用,2004(3):14-17. 被引量:1
  • 2Michael J.Wise.Neweyes:A System for Comparing Biologi-cal Sequences Using the Running Karp-Rabin Greedy String-Tiling Algorithm[C]∥In Third International Conference on In-telligent Systems for Molecular Biology Ambridge,England,pages:393-401. 被引量:1
  • 3Aho A.V.,Hirschberg D.S.,Ullman J.D.:Bounds on theComplexity of the Longest Common Subsequence Problem[J].1976,23(1):1-12. 被引量:1
  • 4Matthew Szuskiewicz.Automatic Plagiarism Detection in Soft-ware Code[A].Information and Communications Technology,May 2003. 被引量:1
  • 5Mauricio Hernandez, Salvatore Stolfo. The merge/purge problem for large databases. In: ACM SIGMOD Record. New York:ACM Press, 1995. 127- 138. 被引量:1
  • 6Alvaro Monge, Charles Elkan. An efficient domain-independent algorithm for detecting approximately duplicate database records.Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD'97), Tucson, AZ, 1997. 被引量:1
  • 7Karen Kukich. Techniques for automatically correcting words in text. ACM Computing Surveys, 1992, 24(4): 377-439. 被引量:1
  • 8Liang Jin, Chen Li, Sharad Mehrotra. Efficient record linkage in large data sets. The 8th Int'l Conf. Database Systems for Advanced Applications, Kyoto, Japan, 2003. 被引量:1
  • 9Surajit Chaudhuri, Kris Ganjam, Venkatesh Ganti, et al. Robust and efficient fuzzy match for online data cleaning. In: Proc. 2003 ACM SIGMOD Int'l Conf. Management of Data. New York:ACM Press, 2003. 313-324. 被引量:1
  • 10Sunita Sarawagi, Anuradha Bhamidipaty. Interactive deduplication using active learning. In: Proc. 8th ACM SIGKDD Int'l Conf.Knowledge Discovery and Data Mining. New York: ACM Press,2002. 269- 278. 被引量:1

共引文献180

同被引文献75

引证文献10

二级引证文献24

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部