摘要
为解决近年来使用依存分析等语法信息计算句子相似度存在的手工标注代价较大、自动标注准确率低影响性能等问题,结合现有的句子相似度算法,提出两种方法融合词性特征计算句子相似度。在高精度的自动词性标注基础上,方法一通过词性信息调整不同词性的单词对句子相似度的影响,方法二使用词性信息选择句子中较为关键的单词进行计算。对比实验中,方法一在实验任务中取得了最高的准确率,方法二具有较优的准确率和较快计算速度,实验结果表明了两种方法的有效性。
To solve the problems of high cost of manual tagging and low accuracy of automatic tagging in sentence similarity calculation using syntactic information such as dependency parsing in recent years,two methods were proposed to compute sentence similarity using POS(part-of-speech)features.On the basis of high-precision automatic POS tagging,the first method was used to adjust the influence of different words on sentence similarity through POS information,and the second method was used to select the key words adopting POS information in the sentence for calculation.Results of contrast experiments show that the first method achieves the highest accuracy in the experimental tasks,and the second method has acceptable accuracy and high calculation speed at the same time.
作者
吴浩
艾山·吾买尔
卡哈尔江·阿比的热西提
王路路
吐尔根·依布拉音
WU Hao;Aishan Wumaier;Kahaerjiang Abiderexiti;WANG Lu-lu;Tuergen Yibulayin(College of Information Science and Engineering,Xinjiang University,Urumqi 830046,China;Xinjiang Laboratory of Multi-Language Information Technology,Xinjiang University,Urumqi 830046,China)
出处
《计算机工程与设计》
北大核心
2020年第1期150-155,共6页
Computer Engineering and Design
基金
新疆维吾尔自治区重点实验室开放课题基金项目(2018D04019)
国家自然科学基金项目(61762084、61331011、61662077、61462083)
关键词
句子相似度
词性
权重
词向量
语义
sentence similarity
POS
word2vec
word weight
semantic