EPCC, The University of Edinburgh, Edinburgh EH9 3JZ, UK.
Bioinformatics. 2012 Dec 1;28(23):3134-6. doi: 10.1093/bioinformatics/bts571. Epub 2012 Sep 27.
The Genome-wide Complex Trait Analysis (GCTA) software package can quantify the contribution of genetic variation to phenotypic variation for complex traits. However, as those datasets of interest continue to increase in size, GCTA becomes increasingly computationally prohibitive. We present an adapted version, Advanced Complex Trait Analysis (ACTA), demonstrating dramatically improved performance.
We restructure the genetic relationship matrix (GRM) estimation phase of the code and introduce the highly optimized parallel Basic Linear Algebra Subprograms (BLAS) library combined with manual parallelization and optimization. We introduce the Linear Algebra PACKage (LAPACK) library into the restricted maximum likelihood (REML) analysis stage. For a test case with 8999 individuals and 279,435 single nucleotide polymorphisms (SNPs), we reduce the total runtime, using a compute node with two multi-core Intel Nehalem CPUs, from ∼17 h to ∼11 min.
The source code is fully available under the GNU Public License, along with Linux binaries. For more information see http://www.epcc.ed.ac.uk/software-products/acta.
Supplementary data are available at Bioinformatics online.
全基因组复杂性状分析(GCTA)软件包可以量化遗传变异对复杂性状表型变异的贡献。然而,随着相关数据集的规模不断增加,GCTA 的计算量变得越来越大,难以处理。我们提出了一个经过改编的版本,高级复杂性状分析(ACTA),展示了显著提高的性能。
我们重构了代码中的遗传关系矩阵(GRM)估计阶段,并引入了高度优化的并行基本线性代数子程序(BLAS)库,结合手动并行化和优化。我们将线性代数包(LAPACK)库引入到受限最大似然(REML)分析阶段。对于一个包含 8999 个人和 279435 个单核苷酸多态性(SNP)的测试案例,我们使用一个具有两个多核英特尔 Nehalem CPU 的计算节点,将总运行时间从约 17 小时减少到约 11 分钟。
源代码完全在 GNU 公共许可证下可用,并提供 Linux 二进制文件。更多信息请访问 http://www.epcc.ed.ac.uk/software-products/acta。
补充数据可在生物信息学在线获得。