摘要
比较文本对于企业竞争产品分析至关重要,但目前面向问答领域的比较文本分类研究较少。针对问答文本中比较信息丰富、主题集中的特点,提出了基于主题特征和关键词特征扩展的比较文本分类方法。通过预训练主题模型,推断问答文本的主题概率分布作为其主题特征;针对向量拼接、求和导致关键词信息流失的问题,设计GRU自编码器实现关键词向量特征提取。综合文本主题信息和关键词语义,从语言、产品、情感、社交、主题、关键词角度构建比较文本分类特征,最后使用多种分类器对问答文本进行分类。实验结果表明,构建的特征行之有效,比较文本分类效果较好。
Comparative text is very important for competitive products analysis,but there are few researches on the classification of comparative text in the Q&A field.Aiming at the characteristics of rich information and concentrated topics in Q&A texts,this paper proposes a comparative text classification method based on topic feature and keyword feature expansion.Based on the pretrained topic model,the topic probability distribution of the Q&A text is inferred as its topic feature.In view of the keyword information loss caused by vector concatenation and summation,GRU-autoencoder is designed to realize feature extraction,and the encoder output is used as the keyword feature of Q&A text.Integrating the topic information and keyword semantics,the comparative text features are constructed from the perspectives of linguistics,product,sentiment,social,topic and keyword,then the Q&A text is classified by using various classifiers.The experimental results show that the constructed features are effective and the effect of the classification are better.
作者
丁勇
程家桥
蒋翠清
王钊
DING Yong;CHENG Jiaqiao;JIANG Cuiqing;WANG Zhao(School of Management,Hefei University of Technology,Hefei 230009,China;Key Laboratory of Process Optimization and Intelligent Decision-making of Ministry of Education,Hefei 230009,China)
出处
《计算机工程与应用》
CSCD
北大核心
2021年第17期196-202,共7页
Computer Engineering and Applications
基金
国家自然科学基金重点项目(71731005)
教育部人文社会科学规划基金项目(15YJA630010)。
关键词
主题模型
自编码器
特征扩展
比较文本分类
topic model
autoencoder
feature expansion
comparative text classification