期刊文献+

面向泛娱乐文本的层次多标签分类方法

PAN-ENTERTAINMENT TEXT INFORMATION LEVEL MULTI-LABEL CLASSIFICATION
下载PDF
导出
摘要 针对泛娱乐领域文本情报预测类别标签具备有向无环图(DAG)结构的特点,提出一种考虑标签层次结构的基于最优路径层次多标签分类方法。根据现有标签构建DAG结构并将其转化为较易处理的树形结构;采用局部策略为树形结构中每个节点分别训练基分类器,同时为每个节点设置贡献值,贡献值由分类器输出概率与层次权重组合而成,贡献值大于阈值时该节点设置为1,否则为0;对树形结构进行深度优先遍历生成路径,计算各路径得分,选择满足层次约束并得分最高的路径作为最终预测集合。在泛娱乐公开文本信息数据集上进行了4组实验,结果表明该方法相较于分类器链、二元分析、SVM多标签分类和MLKNN算法,分类效果更优。 Aiming at the characteristics of the directed acyclic graph(DAG) structure in the pan-entertainment text-level multi-label classification system, this paper proposes a multi-label classification method based on the optimal path considering the label hierarchy. This method constructed the DAG structure based on the existing tags and converted the DAG structure into a more easily-processed tree structure. A local strategy was used to train a base classifier for each node in the tree structure and set contribution value for each node at the same time, which consisted of the classifier’s output probability and hierarchical weights. When the contribution value was greater than the threshold, the node was set to 1, otherwise it was 0. The tree structure was depth-first traversed to generate paths, and the path score was calculated. The path that satisfied the hierarchical constraints and had the highest score was selected as the final prediction set. Four experiments were performed on the pan-entertainment public information data set. The experimental results show that the proposed method has better classification effect than the classifier chain, binary analysis and MLKNN algorithm.
作者 陈若愚 刘秀磊 于汝意 Chen Ruoyu;Liu Xiulei;Yu Ruyi(Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science and Technology University,Beijing 100101,China;Laboratory of Data Science and Intelligence Analysis Research,Beijing Information Science and Technology University,Beijing 100101,China)
出处 《计算机应用与软件》 北大核心 2023年第1期60-65,共6页 Computer Applications and Software
基金 北京市教育委员会科技计划面上项目(KM201811232018) 网络文化与数字传播北京市重点实验室(ICDDXN006) 北京信息科技大学“勤信人才”培育计划项目(QXTCP C202111)。
关键词 层次多标签分类 最优路径 有向无环图结构 树形结构 Hierarchical multi-label classification Optimal path Directed acyclic graph structure Tree structure
  • 相关文献

参考文献4

二级参考文献66

  • 1Schapire R E, Singer Y. BoosTcxter: A boostnlg bsed syslem for text categorizaion[J]. Machine Lcarning . 39(2/3): 135- 168. 被引量:1
  • 2McCallum A. Muhi-lahcl lext classification with a micture model trained by EM[C] //Proc of *he Working Nolcs of 11/ AAAI'99 Workshop on Text I.earning. Menlo Park, CA: AAAI Press, 1999. 被引量:1
  • 3Elissecff A, Weston J. A kcrtxel method for multi -labeclledclassification [C] //Advances in Neural Informalion Processing Systcms 14. Cambridge, MA: M1T Press, 2002: 681 -687. 被引量:1
  • 4QiGJ, HuaX S, Rui Y, et al. Corrclaativcmulti label vidco annotation [C] //Proc of the 15th ACM Int Conf on Muhimedia. New York: ACM, 2007:17- 26. 被引量:1
  • 5Aha D W. Specied A1 review issoe on lazy learning [J ]. Artificial Intelligcnce Review, 1997. 11(1/2/3/4/5): 7 -10. 被引量:1
  • 6Zhang M L,, Zhou Z H. ML-hNN: A lazy lcarning approach to multi label learning [J]. Paltern Recognition, 2007. 10 (7): 2038 -2048. 被引量:1
  • 7Freund Y, Sc:hapire R E. A dccision theoretic gcncralization of on-linc learning and an applocation to boosting[G]//Lecture Notcs in Computer Scicnce 904.Bcrlin:Springer.1995:23-37. 被引量:1
  • 8Dempstcr A P, 1.aird N M, Rubin D B+ Maxitnuntlikclihood from incomplete data via the EM algorithm[J].Journal of the Royal Statistics Socicty B, 1977, 39(1): 1-38. 被引量:1
  • 9Ueda N, Saito K. Parametric mixturc models for multi label text [C] //Advances in Neural Information Processing Systems 15. Cambridge, MA= MITPress, 2003:721-728. 被引量:1
  • 10Dumais S T, Platt J, Heckerman D, et al. Inductive learning algorithm and representation for text categorization [C]// Proc of the 7th ACM Int Conf on Information and Knowledge Management. New York: ACM, 1998= 148-155. 被引量:1

共引文献52

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部