摘要
移动互联时代资讯泛滥,导致违规采编发布互联网新闻信息、散播虚假信息等"标题党"网络传播乱象,识别"标题党"已成为当前互联网整治的重要任务。文章分析了当前互联网"标题党"的核心特征。对其中5类"标题党"进行详细分析。对比了当前流行的多种识别算法的表现,给出了对应的查全率和查准率。提出一种基于规则匹配的"标题党"识别算法,在综合类型"标题党"语料集中表现较好,弥补当前"标题党"识别算法的局限性。
In the era of mobile internet,information is overload,leading to illegal editing and publishing of internet news information,dissemination of false information and other "sensational headline writer" network dissemination chaos,identification of "sensational headline writer" has become an important task of the current internet rectification.This paper analyzes the core characteristics of the current internet "sensational headline writer".Five categories of "sensational headline writer" are analyzed in detail.Compared with the performance of many popular recognition algorithms,the corresponding recall rate and precision rate are given.This paper proposes a "sensational headline writer" recognition algorithm based on rule matching,which performs well in the comprehensive type of "sensational headline writer" corpus,and makes up for the limitations of the current "sensational headline writer" recognition algorithm.
作者
杨小峰
YANG Xiaofeng(Zhongyuan Converging Media Technology Research Center,Zhengzhou 450007,China)
出处
《现代信息科技》
2020年第20期124-127,共4页
Modern Information Technology
基金
2021年河南省科技计划项目(21 2102210417)。
关键词
规则匹配
自动化
“标题党”识别
自然语言处理
rule matching
automation
"sensational headline writer"recognition
natural language processing