期刊文献+

一种基于字符对比的文本相似度计算方法 被引量:1

Text similarity calculation method based on character comparison
下载PDF
导出
摘要 为解决包含重复字符的文本相似度计算问题,提出了一种新的计算方法来获取两文本之间的相似度。首先根据单字符的对比情况统计重复字符数量;其次通过分析总的对比结果剔除重复字符的干扰;然后借助公式计算出正确的文本相似度,并拓展单字节字符和多字节字符混合时的相似度计算方法;最后编写算法代码来进行仿真分析,多组测试结果表明,用该方法计算得到的文本相似度与理论值相吻合。 In order to solve the problem of text similarity calculation with repeated characters,a new method is proposed to obtain the similarity between two texts.First,the number of repeated characters is counted according to the comparison of single characters.Then,the interference of repeated characters is eliminated by analyzing the total comparison results.And then,the correct text similarity is calculated by the formula,and the similarity calculation method of single-byte characters and multi-byte characters mixed is expanded.Finally,the algorithm code is compiled for simulation analysis,and several groups of test results show that the text similarity calculated by this method is consistent with the theoretical value.
作者 汪亚东 Wang Yadong(School of Instrument and Electronics,North University of China,Taiyuan,Shanxi 030051,China)
出处 《计算机时代》 2023年第6期87-91,共5页 Computer Era
关键词 自然语言处理 文本相似度 重复字符 计算算法 natural language processing text similarity repeated character computing algorithm
  • 相关文献

参考文献11

二级参考文献71

共引文献195

同被引文献2

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部