摘要
In order to improve the performance of classifiers in subjective domains, this paper defines a metric to measure the quality of the subjectively labelled training data (QoSTD) by means of K-means clustering. Then, the QoSTD is used as a weight of the predicted class scores to adjust the likelihoods of instances. Moreover, two measurements are defined to assess the performance of the classifiers trained by the subjective labelled data. The binary classifiers of Traditional Chinese Medicine (TCM) Zhengs are trained and retrained by the real-world data set, utilizing the support vector machine (SVM) and the discrimination analysis (DA) models, so as to verify the effectiveness of the proposed method. The experimental results show that the consistency of likelihoods of instances with the corresponding observations is increased notable for the classes, especially in the cases with the relatively low QoSTD training data set. The experimental results also indicate the solution how to eliminate the miss-labelled instances from the training data set to re-train the classifiers in the subjective domains.
In order to improve the performance of classifiers in subjective domains, this paper defines a metric to measure the quality of the subjectively labelled training data (QoSTD) by means of K-means clustering. Then, the QoSTD is used as a weight of the predicted class scores to adjust the likelihoods of instances. Moreover, two measurements are defined to assess the performance of the classifiers trained by the subjective labelled data. The binary classifiers of Traditional Chinese Medicine (TCM) Zhengs are trained and retrained by the real-world data set, utilizing the support vector machine (SVM) and the discrimination analysis (DA) models, so as to verify the effectiveness of the proposed method. The experimental results show that the consistency of likelihoods of instances with the corresponding observations is increased notable for the classes, especially in the cases with the relatively low QoSTD training data set. The experimental results also indicate the solution how to eliminate the miss-labelled instances from the training data set to re-train the classifiers in the subjective domains.