Division of Medicine, University College London, London WC1E 6BT, UK, Institute of Biotechnology, University of Helsinki, Helsinki 00014, Finland.
Bioinformatics. 2013 Feb 15;29(4):413-9. doi: 10.1093/bioinformatics/bts704. Epub 2012 Dec 13.
Linkage analysis remains an important tool in elucidating the genetic component of disease and has become even more important with the advent of whole exome sequencing, enabling the user to focus on only those genomic regions co-segregating with Mendelian traits. Unfortunately, methods to perform multipoint linkage analysis scale poorly with either the number of markers or with the size of the pedigree. Large pedigrees with many markers can only be evaluated with Markov chain Monte Carlo (MCMC) methods that are slow to converge and, as no attempts have been made to exploit parallelism, massively underuse available processing power. Here, we describe SWIFTLINK, a novel application that performs MCMC linkage analysis by spreading the computational burden between multiple processor cores and a graphics processing unit (GPU) simultaneously. SWIFTLINK was designed around the concept of explicitly matching the characteristics of an algorithm with the underlying computer architecture to maximize performance.
We implement our approach using existing Gibbs samplers redesigned for parallel hardware. We applied SWIFTLINK to a real-world dataset, performing parametric multipoint linkage analysis on a highly consanguineous pedigree with EAST syndrome, containing 28 members, where a subset of individuals were genotyped with single nucleotide polymorphisms (SNPs). In our experiments with a four core CPU and GPU, SWIFTLINK achieves a 8.5× speed-up over the single-threaded version and a 109× speed-up over the popular linkage analysis program SIMWALK.
SWIFTLINK is available at https://github.com/ajm/swiftlink. All source code is licensed under GPLv3.
连锁分析仍然是阐明疾病遗传成分的重要工具,随着外显子组测序的出现,它变得更加重要,使研究人员能够专注于与孟德尔性状共分离的基因组区域。不幸的是,进行多点连锁分析的方法在标记数量或家系大小方面扩展能力都很差。具有大量标记的大型家系只能使用马尔可夫链蒙特卡罗(MCMC)方法进行评估,这些方法收敛速度较慢,而且由于没有尝试利用并行性,因此极大地浪费了可用的处理能力。在这里,我们描述了 SWIFTLINK,这是一种新的应用程序,通过同时在多个处理器内核和图形处理单元(GPU)之间分配计算负担来执行 MCMC 连锁分析。SWIFTLINK 的设计围绕着将算法的特性与底层计算机架构相匹配的概念展开,以最大限度地提高性能。
我们使用重新设计用于并行硬件的现有 Gibbs 抽样器来实现我们的方法。我们将 SWIFTLINK 应用于一个真实世界的数据集,对包含 28 名成员的具有 EAST 综合征的高度近亲繁殖家系进行参数多点连锁分析,其中一部分个体使用单核苷酸多态性(SNP)进行基因分型。在我们使用四核 CPU 和 GPU 的实验中,SWIFTLINK 相对于单线程版本实现了 8.5 倍的加速,相对于流行的连锁分析程序 SIMWALK 实现了 109 倍的加速。
SWIFTLINK 可在 https://github.com/ajm/swiftlink 上获得。所有源代码均根据 GPLv3 获得许可。