摘要
传统基于单位点的全基因组关联研究存在重复性低、难以解释性等缺陷,而采用基于机器学习的上位性分析中面临计算复杂度高、预测准确度不足等问题.本文提出一种分析全基因组上位性的新方法,该方法采用二阶段框架的上位性分析方法,它包含特征过滤阶段以及上位性组合优化阶段,在特征过滤阶段提出了多准则融合策略,从多个不同角度评价遗传变异位点,以保证易感的弱效位点能被保留,然后采用多准测排序融合策略剔除与疾病状态关联程度低的遗传变异,进一步在上位性组合优化阶段采用贪婪算法启发式地搜索组合空间,以降低时间复杂度,最后采用支持向量机作为上位性评价模型.实验中采用不同的连锁不平衡参数与经典算法SNPruler与ACO的性能进行对比,实验结果表明:本文方法能有效保留弱效位点,一定程度上提高了疾病预测的正确度.
Traditional units of genome-wide association studies have serious defects such as low repeat- ability, difficulty to interpret, and epistasis analysis based on machine learning has troubles such as high computational complexity and insufficient prediction accuracy. This paper presented a new approach for the analysis of genome-wide epistatic. This method uses the framework of two-phase epistatic analysis meth- od. It includes a filtering stage and an epistatic combinatorial optimization stage. The characteristics of the filtering stage presents a multicriteria fusion strategy for the evaluation of genetic loci from multiple per- spectives to ensure that the weak effect of susceptibility loci can be retained, and then, this method uses the multiple criteria sorting fusion strategy to eliminate the low degree of genetic variation associated with disease states. Epistatic combinatorial optimization phase uses the greedy algorithm combination of heuristic search space in order to reduce the time complexity. Finally, a support vector machine was used as the epistatic evaluation model. Experiments with different parameters of linkage disequilibrium SNPruler with classical algorithms were compared with the performance of the ACO, and the experiment results show that the method can effectively keep weak effect locus and improve disease forecasting accuracy considerably.
出处
《湖南大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2016年第10期155-160,共6页
Journal of Hunan University:Natural Sciences
基金
国家自然科学基金资助项目(61672223)
湖南省自然科学基金资助项目(2016JJ4029)
关键词
全基因组关联研究
上位性
复杂疾病
智能计算
GWAS (Genome-Wide Association Study)
epistasis
complex diseases
intelligent computing