摘要
基因-疾病关联关系预测已经成为当前生物医学研究的一个热点。现有的关联预测方法通常会遭受基因-疾病关联数据稀疏和PU(positive and unlabeled)问题的影响。基于以上不足,提出一种基于Katz增强归纳型矩阵补全的基因-疾病关联预测模型。该模型由基于Katz方法的预估计和基于归纳型矩阵补全方法的精化估计两个步骤组成。具体地,先利用Katz方法基于基因-疾病异构网络对基因-疾病关联进行预估计,以期缓解关联数据稀疏和PU问题的影响。然而,受制于相似度网络的质量,Katz方法在预估计基因-疾病关联时不可避免地会引入一些噪声,为此,将弹性网正则化技术引入传统的归纳型矩阵补全模型以增强其鲁棒性,进而用改进的归纳型矩阵补全模型来精化基因-疾病关联预测效果。实验结果表明,与目前流行的基因-疾病关联预测方法相比,所提出的模型在查全率和查准率上均有显著提高,同时也能解决关联预测中常见的冷启动问题。
Predicting gene-disease associations has been a focus in current biomedical research. Most existing methods suffer from the sparsity of gene-disease associations and PU (positive and unlabeled) problem. Therefore, a new algorithm called KIMC (Katz method to boost inductive matrix completion) has been proposed to predict genedisease associations. The model consists of two steps: pre-estimation based on Katz method and refined estimation based on inductive matrix completion method. It first exploits Katz method to estimate gene-disease associations based on gene- disease heterogeneous networks. This step can alleviate the effect caused by the sparsity of genedisease associations and PU problem. However, subject to the quality of the similarity network, the Katz method inevitably introduces some noise. Then, to address the challenge, this paper introduces the elastic-net regularization into IMC (inductive matrix completion) to enhance robustness and improve the prediction of gene-disease associations. The experimental results on real datasets show that the method achieves significantly superior precision and recall rates compared with several state-of-the-art models. Meanwhile, this method can solve the cold start problem.
作者
浦建宇
陈蕾
邵楷
PU Jianyu;CHEN Lei;SHAO Kai(School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China;Jiangsu High Technology Research Key Laboratory for Wireless Sensor Networks,Nanjing 210023,China;College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 210016,China)
出处
《计算机科学与探索》
CSCD
北大核心
2019年第7期1154-1164,共11页
Journal of Frontiers of Computer Science and Technology
基金
国家自然科学基金No.61572263
江苏省自然科学基金No.BK20161516
中国博士后科学基金No.2015M581794
江苏省博士后科研资助计划No.1501023C~~
关键词
基因-疾病关联预测
矩阵补全
异构信息网络
弹性网正则化
生物医学信息处理
gene-disease association prediction
matrix completion
heterogeneous information networks
elasticnet regularization
biomedical information processing