摘要
近年来多源数据融合成为蛋白质功能预测的一个热点,本文提出一种基于Choquet模糊积分的多源数据融合方法对酵母蛋白进行预测.文中采用支持向量机做基础分类器对各个数据源进行预测,输出概率形式的结果.使用粒子群算法确定模糊密度,基于Choquet模糊积分对每个数据源的结果进行融合.实验表明Choquet模糊积分蛋白质功能预测结果要明显优于传统的加权平均法、支持向量机方法和K近邻方法.
Predicting the function of protein is one of the main issues in the post-genomic period and the availability of large amounts of biological data makes it can be achieved.But in many cases the biological data obtained through biotechnology have a high degree of noise and generally a single data source can only provide useful information for a subset of the protein function classes.So data fusion using diverse biological data to predict the protein function arouses general interest in recent years.Compare with the common information fusion method of weighted average,fuzzy measure can reflect not only the importance of different objects,but also the interactions among objects.So in this paper,Choquet fuzzy integral fusion based on fuzzy measure is used to integrate the probabilistic outputs of different classifiers.And the particle swarm algorithm is adopted to search the optimized values of fuzzy density which is crucial for the fuzzy integral. Six data sets are used in this paper.The first five data sets are collected from the open database or calculated by the software of the open database and the last one is the union of the first five.Then the probabilistic support vector machines as base learners are applied to predict the functions of examples from each data set.The Choquet fuzzy integral method which based on the first five data sets' probabilistic outputs of the base learners will be applied.Comparison is made among the Choquet fuzzy integral method,weighted average method,support vector machines method and K nearest neighbors method.The performances of these methods are compared using ten-fold cross-validation techniques.The experimental results show that the Choquet fuzzy integral method performs much better and the data fusion methods which combine multiple types of biological data can substantially improve the results.
出处
《南京大学学报(自然科学版)》
CAS
CSCD
北大核心
2012年第1期63-69,共7页
Journal of Nanjing University(Natural Science)
基金
吉林省科技发展项目(20090501)