Hormozdiari Fereydoun, Alkan Can, Eichler Evan E, Sahinalp S Cenk
School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada V5A 1S6.
Genome Res. 2009 Jul;19(7):1270-8. doi: 10.1101/gr.088633.108. Epub 2009 May 15.
Recent studies show that along with single nucleotide polymorphisms and small indels, larger structural variants among human individuals are common. The Human Genome Structural Variation Project aims to identify and classify deletions, insertions, and inversions (>5 Kbp) in a small number of normal individuals with a fosmid-based paired-end sequencing approach using traditional sequencing technologies. The realization of new ultra-high-throughput sequencing platforms now makes it feasible to detect the full spectrum of genomic variation among many individual genomes, including cancer patients and others suffering from diseases of genomic origin. Unfortunately, existing algorithms for identifying structural variation (SV) among individuals have not been designed to handle the short read lengths and the errors implied by the "next-gen" sequencing (NGS) technologies. In this paper, we give combinatorial formulations for the SV detection between a reference genome sequence and a next-gen-based, paired-end, whole genome shotgun-sequenced individual. We describe efficient algorithms for each of the formulations we give, which all turn out to be fast and quite reliable; they are also applicable to all next-gen sequencing methods (Illumina, 454 Life Sciences [Roche], ABI SOLiD, etc.) and traditional capillary sequencing technology. We apply our algorithms to identify SV among individual genomes very recently sequenced by Illumina technology.
最近的研究表明,除了单核苷酸多态性和小的插入缺失外,人类个体之间较大的结构变异也很常见。人类基因组结构变异项目旨在使用传统测序技术,通过基于fosmid的双末端测序方法,在少数正常个体中识别和分类缺失、插入和倒位(>5 Kbp)。新型超高通量测序平台的出现,使得检测许多个体基因组(包括癌症患者和其他患有基因组源性疾病的个体)中的全谱基因组变异成为可能。不幸的是,现有的用于识别个体间结构变异(SV)的算法尚未设计用于处理短读长以及“下一代”测序(NGS)技术所带来的错误。在本文中,我们给出了参考基因组序列与基于下一代双末端全基因组鸟枪法测序个体之间SV检测的组合公式。我们描述了针对所给出的每个公式的高效算法,这些算法结果都快速且相当可靠;它们也适用于所有下一代测序方法(Illumina、454生命科学公司[罗氏公司]、ABI SOLiD等)以及传统的毛细管测序技术。我们应用我们的算法来识别最近通过Illumina技术测序的个体基因组中的SV。