摘要
据调查,聊天消息和推文信息是最容易被检测出包含侮辱性词汇的数据集,在本项研究中,收集了一些具有侮辱性词汇的数据集,然后思考根据这些数据集的上下文判断会产生什么结果,一般情况下,如果数据集内容模棱两可,那么就需要在特定的背景下进行解释。利用BERT、SVM和BiLSTM模型进行上下文分类训练,比较结果发现,基于BERT的上下文感知分类,更适合类似的实际应用场景。
According to the survey, chat messages and tweets are the most easily detected data sets containing insulting words.In this study, it collected some data sets with insulting words, and then thought about the contextual judgments based on these data sets. What results will be produced, in general, if the content of the data set is ambiguous, then it needs to be explained in a specific context.Useing BERT, SVM and BiLSTM models for context classification training. The comparison results show that the context-aware classification based on BERT is more suitable for similar practical application scenarios.
作者
周瀚章
ZHOU Hanzhang(Guangdong Baiyun University,Guangdong Guangzhou 510450)
出处
《长江信息通信》
2021年第11期72-74,共3页
Changjiang Information & Communications
基金
2021年度广东白云学院校级科研项目(2021BYKY22)。