摘要
针对中文口语短文本缺少上下文信息、语法不规范和噪声较大等特征造成语义模糊,进而导致用户意图识别准确率不高的问题,提出了一种基于多特征融合的意图识别算法。算法对传统Bi-LSTM(Bi-directional Long Shot-Term Memory)文本分类算法进行改进,将原始文本的字向量、词向量、词性向量和实体知识库向量进行融合,结合字级别的意图识别模型,在人工标注的实际场景下的用户意图数据集上进行训练和测试。实验结果表明,改进后的用户意图识别算法在实际场景中准确率等评价指标有明显提高。
To solve the problem of semantic ambiguity caused by the lack of contextual information,irregular grammar in the spoken Chinese text,an intention recognition algorithm based on multi-feature fusion is proposed,which improves the traditional Bi-LSTM(Bi-directional Long Shot-Term Memory)text classification algorithm.The improved algorithm fuses the character vector,word vector,part-of-speech vector and entity-knowledge-based vector of the original text,combines the character level intention recognition model,and conducts training and testing on the user intention data set under the real scene of manual annotation.The experimental results show that the improved intention recognition algorithm has a significant improvement in the accuracy and oth⁃er evaluation indexes in the real scene.
作者
周权
陈永生
郭玉臣
ZHOU Quan;CHEN Yong-sheng;GUO Yu-chen(Department of Computer Science and Technology,Tongji University,Shanghai 201804,China)
出处
《电脑知识与技术》
2020年第21期28-31,共4页
Computer Knowledge and Technology
关键词
意图识别
短文本分类
多特征融合
词嵌入
深度学习
Bi-LSTM
intent recognition
short text classification
multi-feature fusion
word embedding
deep learning
Bi-LSTM