Quality Assessment of Training Data with Uncertain Labels for Classification of Subjective Domains

Quality Assessment of Training Data with Uncertain Labels for Classification of Subjective Domains

下载PDF

导出

摘要 In order to improve the performance of classifiers in subjective domains, this paper defines a metric to measure the quality of the subjectively labelled training data (QoSTD) by means of K-means clustering. Then, the QoSTD is used as a weight of the predicted class scores to adjust the likelihoods of instances. Moreover, two measurements are defined to assess the performance of the classifiers trained by the subjective labelled data. The binary classifiers of Traditional Chinese Medicine (TCM) Zhengs are trained and retrained by the real-world data set, utilizing the support vector machine (SVM) and the discrimination analysis (DA) models, so as to verify the effectiveness of the proposed method. The experimental results show that the consistency of likelihoods of instances with the corresponding observations is increased notable for the classes, especially in the cases with the relatively low QoSTD training data set. The experimental results also indicate the solution how to eliminate the miss-labelled instances from the training data set to re-train the classifiers in the subjective domains. In order to improve the performance of classifiers in subjective domains, this paper defines a metric to measure the quality of the subjectively labelled training data (QoSTD) by means of K-means clustering. Then, the QoSTD is used as a weight of the predicted class scores to adjust the likelihoods of instances. Moreover, two measurements are defined to assess the performance of the classifiers trained by the subjective labelled data. The binary classifiers of Traditional Chinese Medicine (TCM) Zhengs are trained and retrained by the real-world data set, utilizing the support vector machine (SVM) and the discrimination analysis (DA) models, so as to verify the effectiveness of the proposed method. The experimental results show that the consistency of likelihoods of instances with the corresponding observations is increased notable for the classes, especially in the cases with the relatively low QoSTD training data set. The experimental results also indicate the solution how to eliminate the miss-labelled instances from the training data set to re-train the classifiers in the subjective domains.

作者 Ying Dai

机构地区 Faculty of Software and Information Science

出处《Journal of Computer and Communications》 2017年第7期152-168,共17页 电脑和通信（英文）

关键词 Quality Assessment SUBJECTIVE Domain Multimodal Sensor Data LABEL Noise LIKELIHOOD ADJUSTING TCM ZHENG Quality Assessment Subjective Domain Multimodal Sensor Data Label Noise Likelihood Adjusting TCM Zheng

分类号 R73 [医药卫生—肿瘤]

引文网络
相关文献

1Li-xun Xu,Xu Yu,Yong Wang,Yun-xia Feng.Character Variable Numeralization Based on Dimension Expanding and its Application on Text Classification[J].国际计算机前沿大会会议论文集,2016(1):62-64.
2Wei Hu.Empirical Analysis of a Quantum Classifier Implemented on IBM’s 5Q Quantum Computer[J].Journal of Quantum Information Science,2018,8(1):1-11.
3David De Yong,Sudipto Bhowmik,Fernando Magnago.Optimized Complex Power Quality Classifier Using One vs. Rest Support Vector Machines[J].Energy and Power Engineering,2017,9(10):568-587. 被引量：1

Journal of Computer and Communications

2017年第7期

浏览历史

内容加载中请稍等...

Quality Assessment of Training Data with Uncertain Labels for Classification of Subjective Domains

相关作者

相关机构

相关主题

浏览历史