tmbalanced data is a common and serious problem in many biomedical classification tasks. It causes a bias on the training of classifiers and results in lower accuracy of minority classes prediction. This problem has a...tmbalanced data is a common and serious problem in many biomedical classification tasks. It causes a bias on the training of classifiers and results in lower accuracy of minority classes prediction. This problem has attracted a lot of research interests in the past decade. Unfortunately, most research efforts only concentrate on 2-class problems. In this paper, we study a new method of formulating a multiclass Support Vector Machine (SVM) problem for imbalanced biomedical data to improve the classification performance. The proposed method applies cost-sensitive approach and ramp loss function to the Crammer and Singer multiclass SVM formulation. Experimental results on multiple biomedical datasets show that the proposed solution can effectively cure the problem when the datasets are noisy and highly imbalanced.展开更多
In recent years, there have been rapid developments in various bioinformatics technologies, which have led to the accumulation of a large amount of biomedical data. The biomedical data can be analyzed to enhance asses...In recent years, there have been rapid developments in various bioinformatics technologies, which have led to the accumulation of a large amount of biomedical data. The biomedical data can be analyzed to enhance assessment of at-risk patients and improve disease diagnosis, treatment, and prevention. However, these datasets usually have many features, which contain many irrelevant or redundant information. Feature selection is a solution that involves finding the optimal subset, which is known to be an NP problem because of the large search space. Considering this, a new feature selection approach based on Binary Chemical Reaction Optimization algorithm (BCRO) and k-Nearest Neighbors (KNN) classifier is presented in this paper. Tabu search is integrated with CRO framework to enhance local search capacity. KNN is adopted to evaluate the quality of selected candidate subset. The results for an experiment conducted on nine standard medical datasets demonstrate that the proposed approach outperforms other state-of-the-art methods.展开更多
基金Supported by GSU Molecular Basis of Disease Graduate Fellow, 2011-2012
文摘tmbalanced data is a common and serious problem in many biomedical classification tasks. It causes a bias on the training of classifiers and results in lower accuracy of minority classes prediction. This problem has attracted a lot of research interests in the past decade. Unfortunately, most research efforts only concentrate on 2-class problems. In this paper, we study a new method of formulating a multiclass Support Vector Machine (SVM) problem for imbalanced biomedical data to improve the classification performance. The proposed method applies cost-sensitive approach and ramp loss function to the Crammer and Singer multiclass SVM formulation. Experimental results on multiple biomedical datasets show that the proposed solution can effectively cure the problem when the datasets are noisy and highly imbalanced.
基金supported in part by the Natural Science Foundation of Henan Province(No.14A520042)Scientific Research Foundation of the Higher Education Institutions of Henan Province(No.18A520021)+1 种基金the National Natural Science Foundation of China(No.61802114)the National Key Technology R&D Program of China(No.2015BAK01B06)
文摘In recent years, there have been rapid developments in various bioinformatics technologies, which have led to the accumulation of a large amount of biomedical data. The biomedical data can be analyzed to enhance assessment of at-risk patients and improve disease diagnosis, treatment, and prevention. However, these datasets usually have many features, which contain many irrelevant or redundant information. Feature selection is a solution that involves finding the optimal subset, which is known to be an NP problem because of the large search space. Considering this, a new feature selection approach based on Binary Chemical Reaction Optimization algorithm (BCRO) and k-Nearest Neighbors (KNN) classifier is presented in this paper. Tabu search is integrated with CRO framework to enhance local search capacity. KNN is adopted to evaluate the quality of selected candidate subset. The results for an experiment conducted on nine standard medical datasets demonstrate that the proposed approach outperforms other state-of-the-art methods.
文摘从高维的生物医学数据中探索发现与疾病相关的基因是目前的热点研究问题,但是大部分生物医学数据具有许多与寻找疾病基因不相关或冗余特征,很难直接投入使用.针对这个问题,提出了一种自适应双种群混合磷虾黑洞算法(modified binary krill herd and black hole algorithm, MBKHA).该算法将改进的二进制磷虾算法与二进制黑洞算法相结合,磷虾算法负责寻找更优的解集,黑洞算法负责加快算法收敛,通过使用自适应划分规则动态调控种群中磷虾个体和恒星个体的数量,从而实现两个算法优势互补.基于5个公开医学微阵列数据集,从多个指标比较了提出的方法和其他特征选择算法的性能,实验结果表明该方法在特征选择上具有更好的性能.