摘要
一个算法的二元分类器构建通常包含两个集合样例,其中一组为正例样本,另一组为负例样本。实际上,我们使用的很多生物数据库,如磷酸激酶抑制剂数据库并非标准数据,磷酸激酶抑制剂数据库只含有不完整的正例样本和未标注样本数据集。这些未标注样本中,既包含正例样本也有负例样本。文章旨在解决的问题是对于非标准数据构建标准二元分类器从而实现未知磷酸激酶抑制剂筛选。通过未标注样本概率输出,对未知磷酸激酶抑制剂进行预测。文章对该PU学习算法进行性能估计,结果显示该算法具有较高的预测性能。
A traditional binary classifiers building algorithm usually contains two sample collection, one group of positive samples and the other one is negative samples. However, many actually biological databases we used such as kinase inhibitor database is nonstandard dataset. The kinase inhibitors database contains only incomplete positive dataset and unlabeled data. Among the unlabeled dataset, positive and negative samples are both possible. The purpose of this paper is to solve the problem of binary classifiers building for untraditional dataset that used for kinase inhibitors clustering. According to the possibility output of unlabeled data, the unknown kinase inhibitors can be prediction. Further we also estimate the prediction performance of PU learning algorithm, the results reveal the PU learning algorithm owns high prediction performance.
出处
《信息通信》
2016年第7期53-55,共3页
Information & Communications