摘要
建立2型糖尿病诊断模型,并通过主动学习解决医疗数据中标记样本较少的问题。2型糖尿病的诊断可以被看作一个代价敏感的二分类问题,本文基于逻辑回归模型、支持向量机模型和人工神经网络模型,采用基于期望误差减小的代价敏感主动学习方法,将主动学习算法和代价敏感分类算法相结合来构建诊断模型,将不同的误分类代价考虑到样本的选择中。在2型糖尿病诊断问题中,基于期望误差减小的代价敏感主动学习算法表现最优,以较少的样本标记达到了最低的误分类代价,因此主动学习算法能够减少医疗数据挖掘中需要标记的样本数,节省标注成本,同时保证模型的性能。
In this study,a diagnosis model for type 2 diabetes was built and the label absence problem in medical data was solved by active learning. The diagnosis of type 2 diabetes can be seen as a cost-sensitive binary classification task. Taking logistic regression,support vector machines(SVM) and artificial neural network(ANN) as the base model,this study adopted the costsensitive active learning algorithm based on the expected error reduction framework,which combined the active learning strategy with the cost-sensitive classification algorithm and introduced the cost information into the instance sampling process. For the diagnosis of type 2 diabetes,the cost-sensitive active learning algorithm based on the expected error reduction framework performed best in these compared active learning strategies and it achieved the minimum misclassification costs by labeling fewer instances.Active learning algorithms can reduce the number of instances to be labeled,save the labeling costs and guarantee the model performance at the same time.
作者
许智彪
XU Zhi-biao(School of Electronic Information and Electrical Engineering,Shanghai Jiao Tong University,Shanghai 200240,Chin)
出处
《计算机与现代化》
2018年第6期84-90,共7页
Computer and Modernization
关键词
糖尿病
诊断模型
代价敏感分类
主动学习
逻辑回归
支持向量机
人工神经网络
diabetes
diagnostic model
cost-sensitive classification
active learning
logistic regression
support vector machine
artificial neural network