摘要
针对泛娱乐领域文本情报预测类别标签具备有向无环图(DAG)结构的特点,提出一种考虑标签层次结构的基于最优路径层次多标签分类方法。根据现有标签构建DAG结构并将其转化为较易处理的树形结构;采用局部策略为树形结构中每个节点分别训练基分类器,同时为每个节点设置贡献值,贡献值由分类器输出概率与层次权重组合而成,贡献值大于阈值时该节点设置为1,否则为0;对树形结构进行深度优先遍历生成路径,计算各路径得分,选择满足层次约束并得分最高的路径作为最终预测集合。在泛娱乐公开文本信息数据集上进行了4组实验,结果表明该方法相较于分类器链、二元分析、SVM多标签分类和MLKNN算法,分类效果更优。
Aiming at the characteristics of the directed acyclic graph(DAG) structure in the pan-entertainment text-level multi-label classification system, this paper proposes a multi-label classification method based on the optimal path considering the label hierarchy. This method constructed the DAG structure based on the existing tags and converted the DAG structure into a more easily-processed tree structure. A local strategy was used to train a base classifier for each node in the tree structure and set contribution value for each node at the same time, which consisted of the classifier’s output probability and hierarchical weights. When the contribution value was greater than the threshold, the node was set to 1, otherwise it was 0. The tree structure was depth-first traversed to generate paths, and the path score was calculated. The path that satisfied the hierarchical constraints and had the highest score was selected as the final prediction set. Four experiments were performed on the pan-entertainment public information data set. The experimental results show that the proposed method has better classification effect than the classifier chain, binary analysis and MLKNN algorithm.
作者
陈若愚
刘秀磊
于汝意
Chen Ruoyu;Liu Xiulei;Yu Ruyi(Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science and Technology University,Beijing 100101,China;Laboratory of Data Science and Intelligence Analysis Research,Beijing Information Science and Technology University,Beijing 100101,China)
出处
《计算机应用与软件》
北大核心
2023年第1期60-65,共6页
Computer Applications and Software
基金
北京市教育委员会科技计划面上项目(KM201811232018)
网络文化与数字传播北京市重点实验室(ICDDXN006)
北京信息科技大学“勤信人才”培育计划项目(QXTCP C202111)。
关键词
层次多标签分类
最优路径
有向无环图结构
树形结构
Hierarchical multi-label classification
Optimal path
Directed acyclic graph structure
Tree structure