摘要
属性抽取是细粒度情感分析的子任务之一,其目标是从评论文本中抽取用户所评价的属性。在特定领域中,某些属性可能会频繁出现在不同的评论文本中,称之为高频属性。高频属性具有较高的领域表征能力,易被监督学习模型感知。相对地,低频属性出现频率低,可供训练的样本总量较为稀疏,使得神经网络模型难以充分学习相应的语言现象,从而使测试阶段的低频属性抽取难度较高。由于低频属性经常与高频属性同时出现在局部文字片段之中,该文根据这一特点,提出一种融合高频属性信息的属性抽取方法:跟踪和记录模型识别的高频属性,使用卷积神经网络和注意力机制编码高频属性的上下文信息,并通过门控机制融入其他词项的表示学习过程中,辅助低频属性的识别。该文在国际语义评测大会2014和2016提供的笔记本电脑及餐馆领域数据集上进行了实验,相比于基线模型,该文方法在这两个英文数据集上F1值分别提升了2.33和1.44个百分点,并且总体性能高于现有前沿技术。
Aspect extraction is one subtask of fine-grained sentiment analysis,which aims to extract the aspects that users express opinions on comments.Appearing in various comments,high-frequency aspects have strong domain representation ability and are easy to be perceived by the supervised learning model.We propose an aspect extraction method that integrates high-frequency aspects information.We track and record the high-frequency aspects recognized by model,encode the context information of high-frequency aspects by convolutional neural network and attention mechanism,and integrate the information into the representation learning process through the gated mechanism.Experiments on two benchmark datasets:Laptop of SemEval-14 and Restaurant of SemEval-16 demonstrate 2.33% and 1.44% improvement,respectively,compared with the baseline models.
作者
潘雨晨
尉桢楷
洪宇
徐庆婷
姚建民
PAN Yuchen;YU Zhenkai;HONG Yu;XU Qingting;YAO Jianmin(School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006,China)
出处
《中文信息学报》
CSCD
北大核心
2023年第1期132-143,共12页
Journal of Chinese Information Processing
基金
国家自然科学基金(61672367,61751206,62076174)。
关键词
属性抽取
高频属性
门控机制
aspect extraction
high-frequency aspects
gated mechanism