摘要
序数回归(也称序数分类)是一种监督学习任务,即使用具有自然顺序的标签对数据项进行分类。序数回归与诸多实际问题密切相关,近几年关于序数回归的研究受到越来越多的关注。序数回归与其他监督学习任务(分类、回归等)一样,需要通过特征提取来提高模型的效率和准确性。虽然特征提取被广泛研究并用于分类学习任务中,但是在序数回归中的研究较少。众所周知,相比单特征,组合特征可以表达更多的数据底层语义,但是加入一般的组合特征很难提高模型的准确性。文中基于频繁模式挖掘,借助K-L散度值来选取最有区分能力的频繁模式进行特征组合,提出了一种新的序数回归组合特征提取方法,并在公开数据集和自有数据集上使用多个序数回归模型进行实验。结果表明,使用最有区分能力的频繁模式组合特征,能够有效提升大多数序数回归模型的训练效果。
Ordinal regression,also known as ordinal classification,is a supervised learning task that uses the labels with a natural order to classify data items.Ordinal regression is closely related to many practical problems.In recent years,the research on ordinal regression has attracted more and more attention.Ordinal regression,like other supervised lear-ning tasks(classification,regression,etc.),requires feature extraction to improve the efficiency and accuracy of the model.However,while feature extraction has been extensively studied for other classification tasks,there are few researches in ordinal regression.It is well known that the combined features could capture more underlying data semantics than single features,but it is difficult to improve the accuracy of the model by adding general combined features.Based on the frequent mining patterns,this paper used the K-L divergence value to select the most discriminative frequent patterns for feature combination,and proposed a new ordinal regression combination feature extraction method.Multiple ordinal regression models are used for validation on both the public and our own datasets.The experimental results show that using the most distinguishing frequent pattern combination features can effectively improve the training effect of most ordinal regression models.
作者
曾庆田
刘晨征
倪维健
段华
ZENG Qing-tian;LIU Chen-zheng;NI Wei-jian;DUAN Hua(College of Computer Science and Engineering,Shandong University of Science and Technology,Qingdao,Shandong 266590,China;College of Electronic and Information Engineering,Shandong University of Science and Technology,Qingdao,Shandong 266590,China;College of Mathematics and Systems Science,Shandong University of Science and Technology,Qingdao,Shandong 266590,China)
出处
《计算机科学》
CSCD
北大核心
2019年第6期69-74,共6页
Computer Science
基金
国家自然科学基金(61472229,61702306,61602278,61602279)
山东省科技发展项目(2016ZDJS02A11,ZR2017BF015,ZR2017MF027)
山东省泰山学者攀登计划专项
山东科技大学科研创新团队支持计划项目基金(2015TDJH102)资助
关键词
序数回归
频繁模式
特征组合
特征选择
Ordinal regression
Frequent pattern
Feature combination
Feature selection