期刊文献+

有Mate-Pairs的个体单体型MSR问题的参数化算法 被引量:2

Parameterized Algorithm of the Individual Haplotyping MSR Problem with Mate-Pairs
下载PDF
导出
摘要 个体单体型MSR(minimum SNP removal)问题是指如何利用个体的基因测序片断数据去掉最少的SNP(single-nucleotide polymorphisms)位点,以确定该个体单体型的计算问题.对此问题,Bafna等人提出了时间复杂度为O(2~kn^2m)的算法,其中,m为DNA片断总数,n为SNP位点总数,k为片断中洞(片断中的空值位点)的个数.由于一个Mate-Pair片段中洞的个数可以达到100,因此,在片段数据中有Mate-Pair的情况下,Bafna的算法通常是不可行的.根据片段数据的特点提出了一个时间复杂度为O((n-1)(k_1-1)k_22^(2h)+(k_1+1)^(2h)+nk_2+mk_1)的新算法,其中,k_1为一个片断覆盖的最大SNP位点数(不大于n),k_2为覆盖同一SNP位点的片段的最大数(通常不大于19),h为覆盖同一SNP位点且在该位点取空值的片断的最大数(不大于k_2).该算法的时间复杂度与片断中洞的个数的最大值k没有直接的关系,在有Mate-Pair片断数据的情况下仍然能够有效地进行计算,具有良好的可扩展性和较高的实用价值. The individual haplotyping MSR (minimum SNP removal) problem is the computational problem of inducing an individual's haplotypes from one's DNA fragments sequencing data by dropping minimum SNPs (single-nucleotide polymorphisms). To solve the problem, Bafna, et al. had provided an algorithm of time complexity o(2^kn^2m) with the number of fragments m, the SNP sites n, the maximum number of holes k in a fragment. In the case that there are some Mate-Pairs, since the number of holes in a Mate-Pair can reach 100, Bafna's algorithm is impracticable. Based on the characters of DNA fragments, this paper presents a new algorithm of time complexity O((n-1)(k1-1)k22^2h+(k1+1)2h+nk2+mkl) with the maximum number of SNP sites that a fragment covers kl (no more than n), the maximum number of the fragments covering a SNP site k2 (usually no more than 19) and the maximum number of fragments covering a SNP site whose value is unknown at the SNP site h (no more than k2). Since the time complexity is not directly related with k, the algorithm can deal with the MSR problem with Mate-Pairs efficiently, and is more scalable and applicable in practice.
出处 《软件学报》 EI CSCD 北大核心 2007年第9期2070-2082,共13页 Journal of Software
基金 Supported by the National Natural Science Foundation of China under Grant No.60433020(国家自然科学基金) the Program for New Century Excellent Talents in University of China under Grant No.NCET-05-0683(新世纪优秀人才支持计划) the Program for Changjiang Scholars and Innovative Research Team in University of China under Grant No.IRT0661(国家教育部创新团队资助项目) the Scientific Research Fund of Hunan Provincial Education Department of China under Grant No.06C52(湖南省教育厅资助科研项目)
关键词 单核苷酸多态性 基因型 单体型 参数化算法 计算复杂度 SNPs, genotype haplotype parameterized algorithm computational complexity
  • 相关文献

参考文献19

  • 1Miller PT,Gu Z,Li Q,Hillier L,Kwok PY.Overlapping genomic sequences:A treasure trove of single-nucleotide polymorphisms.Genome Research,1998,8(7):748-754. 被引量:1
  • 2Stephens JC,Schneider JA,Tanguay DA,Choi J,Acharya T,Stanley SE,Jiang R,Messer CJ,Chew A,Han JH,Duan J,Carr JL,Lee MS,Koshy B,Kumar AM,Zhang G,Newell WR,Windemuth A,Xu C,Kalbfleisch TS,Shaner SL,Arnold K,Schulz V,Drysdale CM,Nandabalan K,Judson RS,Ruano G,Vovis GF.Haplotype variation and linkage disequilibrium in 313 human genes.Science,2001,293(5529):489-493. 被引量:1
  • 3Horikawa Y,Oda N,Cox NJ,Li X,Melander MO,Hara M,Hinokio Y,Lindner TH,Mashima H,Schwarz PEH,Plata LB,Horikawa Y,Oda Y,Yoshiuchi I,Colilla S,Polonsky KS,Wei S,Concannon P,Iwasaki N,Schulze J,Baier LJ,Bogardus C,Groop L,Boerwinkle E,Hanis CL,Bell GI.Genetic variation in the gene encoding calpain-10 is associated with type 2 diabetes mellitus.Natuture Genetics,2000,26(2):163-175. 被引量:1
  • 4Lancia G,Bafna V,Istrail S,Lippert R,Schwartz R.SNPs problems,complexity and algorithms.In:Heide FM,ed.Proc.of the 9th Ann.European Symp.on Algorithms.LNCS 2161,Heidelberg:Springer-Verlag,2001.182-193. 被引量:1
  • 5Roach JC.Random subcloning,pairwise end sequencing,and the molecular evolution of the vertebrate trypsinogens[Ph.D.Thesis].Seattle:University of Washington,1998. 被引量:1
  • 6Bafna V,Istrail S,Lancia G,Rizzi R.Polynomial and APX-hard cases of the individual haplotyping problem.Theoretical Computer Science,2005,335(1):109-125. 被引量:1
  • 7Int'l Human Genome Sequencing Consortium.Initial sequencing and analysis of the human genome.Nature,2001,409(6822):860-921. 被引量:1
  • 8The Int'l SNP Map Working Group.A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms.Nature,2001,409(6822):928-933. 被引量:1
  • 9Venter JC,Adams MD,Myers EW,et al.The sequence of the human genome.Science,2001,291(5507):1304-1351. 被引量:1
  • 10The Int'l HapMap Consortium.A haplotype map of the human genome.Nature,2005,437(7063):1299-1320. 被引量:1

同被引文献10

  • 1Levy S,Sutton G,Ng P C,et al.The diploid genome sequence of an individual human[J].PLoS Biology,2007,5(10). 被引量:1
  • 2Tachmazidou I,Verzilli C J,Iorio M D.Genetic association mapping via evolution-based clustering of haplotypes[J].PLoS Genet,2007,3(7). 被引量:1
  • 3Zhang X S,Wang R S,Wu L Y,et al.Models and algorithms for haplotyping problem[J].Current Bioinformatics,2006,1(1):105-114. 被引量:1
  • 4Lancia G,Bafna V,Istrail S,et al.SNPs problems,com plexity and algorithms[C]//Heide F M.LNCS 2161:Proc of the 9th Ann European Symp on Algorithms.Heidelberg:Springer,2001:182-193. 被引量:1
  • 5Cilibrasi R,Iersel L,Kelk S,et al.The complexity of the single individual SNP haplotyping pProblem[J].Algorithmica,2007,49(1):13-36. 被引量:1
  • 6Wang R S,Wu L Y,Li Z P,et al.Haplotype reconstruction from SNP fragments by minimum error correction[J].Bioinformatics,2005,21 (10):2456-2462. 被引量:1
  • 7Panconesi A.Sozio M.Fast hare:a fast heuristic for single individual SNP haplotype reconstruction[C]//Jonassen I,Kim J.LNCS 3240:Proc of the 4th Int'l Workshop on Algorithms in Bioin-formatics.Heidelberg:Springer,2004:266-277. 被引量:1
  • 8Myers G.A dataset generator for whole genome shotgun sequencing[C]//Lengauer T.Proc of the 7th Int'l Conf Intelligent Systems for Molecular Biology.California:AAAI Press,1999:202-210. 被引量:1
  • 9谢民主,王建新,陈建二.单体型组装MEC问题的参数化算法研究[J].计算机工程与应用,2007,43(35):57-60. 被引量:1
  • 10谢民主,陈建二,王建新.个体单体型问题参数化算法研究[J].计算机学报,2009,32(8):1637-1650. 被引量:4

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部