摘要
在数据挖掘领域,传统的单分类和多分类问题已经得到了广泛的研究.但是多标签数据的普遍存在性和重要性直到近些年来才逐渐得到人们的关注.在多标签分类问题中,由于标签相关性的存在,传统的单分类和多分类问题的解决方法,无法简单地应用于多标签分类问题.文中提出了一种基于随机游走模型的多标签分类算法,称为多标签随机游走算法.首先,将多标签数据映射成为多标签随机游走图.当输入一个未分类数据时,建立一个多标签随机游走图系列.而后,对图系列中的每个图应用随机游走模型,得到遍历每个顶点的概率分布,并将这个点概率分布转化成每个标签的概率分布.最后,基于多标签随机游走算法,文中给出了一种新的阈值学习算法.真实数据集上的实验表明,多标签随机游走算法可以有效地解决多标签分类问题.
There are extensive literatures related to traditional single-class and multi-class classification problems,in which each data point is assigned to one category.But in many applications,a data point may belong to more than one category.This kind of problem is called the Multi-Label Classification(MLC) problem.Due to the existing of label relevance,the traditional data-mining methods cannot be directly applied to the MLC problems.This paper proposes a novel MLC algorithm based on the random walk model,called Multi-Label Random Walk(MLRW) algorithm.Firstly,a multi-label random walk graph is built on the training set.As an unlabeled data arrives,a multi-label random walk graph system will be built,on which the random walk processing is carried out.After that,a probability distribution among all labels is obtained.At last,a threshold learning algorithm is proposed based on the MLRW algorithm so that the final prediction on each label is presented.Experimental results on actual data set show that the MLRW algorithm provides an effective solution to the MLC problems.
出处
《计算机学报》
EI
CSCD
北大核心
2010年第8期1418-1426,共9页
Chinese Journal of Computers
基金
国家自然科学基金(60803016)
国家"九七三"重点基础研究发展规划项目基金(2007CB310802
2009CB320706)
国家"八六三"高技术研究发展计划项目基金(2008AA042301
2007AA040602)资助~~
关键词
多标签
分类算法
随机游走
阈值学习
multi-label
classification
random walk
threshold learning