摘要
[目的/意义]网络新闻是获取突发事件情报的重要来源之一,提高海量网络新闻中突发事件的识别准确率和分类效果,并减少非突发事件新闻造成的开放集识别问题和降低人工标注非突发事件新闻的成本,这是当前突发事件识别与分类研究的重要课题。[方法/过程]选择BERT预训练模型获得文本的特征表示,融合不同层级之间的语义信息增强文本表示的质量,采用自适应决策边界模型,学习各突发事件类别在高维语义表示空间上的球形最佳决策边界,根据新闻样本的文本表示和各突发事件类别的球形最佳决策边界的欧几里得距离,检测出突发事件新闻并判断突发事件的类别,并在CEC公开数据集和实时爬取的中文新闻数据集CEN上对模型的有效性进行验证。[结果/结论]实验结果表明,本文模型在CEC数据集和CEN数据集上的宏F1值分别为98.46%和95.80%,与基准模型相比,本文模型的宏F1值分别提升了5.15%和19.69%。模型应用展示了提出方法在解决实际问题时的有效性。[局限]未考虑突发事件新闻可能存在多标签的情况。
[Purpose/significance]Online news is one of the important sources to obtain emergency news Intelligently.Research on emergency news recognition and classification is focused on increasing the accuracy of recognition and classification,reducing the open set recognition interference arising from non-emergency news,and reducing the cost of labeling non-emergency news manually.[Method/process]The BERT pre-training model is selected to obtain the feature representation of the text,and the quality of the text representation is enhanced by fusing the semantic information between different levels.On this basis,the adaptive decision boundary model is proposed to learn the spherical best decision boundary of each breaking news category on the high-dimensional semantic representation space,based on the Euclidean distance between the text representation of the news samples and the spherical best decision boundary of each breaking news category.Then,detecting emergency news and determining their category,and validating the effectiveness of the model on the dataset CEC and CEN.[Result/conclusion]The experimental results show that the Macro-F1 values of this model are 98.46%and 95.80%on the CEC and CEN dataset respectively,and the Macro-F1 values of this model are 5.15%and 20.36%enhanced respectively compared with the benchmark model.The application of the model demonstrates the effectiveness of the proposed method.[Limitations]The possible existence of multiple labels for breaking news was not considered.
出处
《情报理论与实践》
CSSCI
北大核心
2023年第2期194-200,共7页
Information Studies:Theory & Application
基金
国家社会科学基金西部项目“情报流程重构视角下的应急过程多目标优化研究”的成果,项目编号:19XTQ010。
关键词
突发事件
自适应决策边界
开放集识别
文本分类
emergency events
adaptive decision boundary
open set recognition
text classification