摘要
【目的】为解决突发公共卫生事件初期微博数据量较少和口语化表达较多导致情感分析效果不佳的问题,提出一种基于提示嵌入和情感特征融合的微博情感分析模型。【方法】根据构建的情感词典提取微博文本情感信息;使用RoBERTa预训练模型提取语义向量和情感向量,将提示作为前缀嵌入语义向量,使用Transformer编码器和注意力机制分别提取语义特征与情感特征;然后使用焦点损失函数计算样本特征权重;最后,将语义特征与情感特征融合得到情感分析结果。【结果】以突发公共卫生事件中深圳新型冠状病毒感染疫情微博评论数据为例,所提情感分析模型的准确率和F1值分别达到93.46%和93.49%,较基准模型BERT分别提升6.78和6.97个百分点。【局限】微博数据存在大量图片和视频内容,未融合多个模态进行情感分析。【结论】所提模型基于提示嵌入与情感特征融合,可提升样本数据少时的情感分类效果,对其他同类情感分析研究具有借鉴意义。
[Objective]At the early stage of public health emergencies,limited Weibo posts and informal expressions lead to ineffective sentiment analysis.We propose a sentiment analysis model for Weibo posts based on prompt embedding and emotion feature fusion to address this issue.[Methods]First,we extracted the sentiment information from Weibo posts based on the emotional dictionary.Then,we used the pre-trained RoBERTa model to establish semantic and sentiment vectors.We also embedded prompts as prefixes for the semantic vectors.Third,we utilized the Transformer encoder and attention mechanism to extract semantic and emotional features.We also computed the sample feature weights using the focal loss function.Finally,we combined the semantic and emotional features to conduct sentiment analysis.[Results]We examined the new model with Weibo comments on the outbreak of COVID-19 in Shenzhen.The accuracy and F1 score of the model reached 93.46%and 93.49%,which were 6.78%and 6.97%higher than the baseline BERT model.[Limitations]Weibo data contains a large amount of images and videos.However,our model did not include multi-modal fusion for sentiment analysis.[Conclusions]The proposed model could improve the effectiveness of sentiment classification with a small sample data size.
作者
赖宇斌
陈燕
胡小春
黄欣
Lai Yubin;Chen Yan;Hu Xiaochun;Huang Xin(School of Computer,Electronics and Information,Guangxi University,Nanning 530004,China;School of Big Data and Artificial Intelligence,Guangxi University of Finance and Economics,Nanning 530007,China;College of Information Engineering,Guangxi Vocational University of Agriculture,Nanning 530007,China)
出处
《数据分析与知识发现》
EI
CSCD
北大核心
2023年第11期46-55,共10页
Data Analysis and Knowledge Discovery
基金
广西科学研究与技术开发计划项目(项目编号:桂科AA20302002-3)
广西自然科学基金项目(项目编号:2020GXNSFAA159090)的研究成果之一。
关键词
提示嵌入
特征融合
少样本
情感分析
突发公共卫生事件
Prompt Embedding
Feature Fusion
Few Shot
Sentiment Analysis
Public Health Emergency