Kim Bongsong, Beavis William D
Department of Agronomy, Iowa State University, Ames, IA, USA.
Evol Bioinform Online. 2017 Mar 10;13:1176934316688663. doi: 10.1177/1176934316688663. eCollection 2017.
We introduce software, Numericware i, to compute identical by state (IBS) matrix based on genotypic data. Calculating an IBS matrix with a large dataset requires large computer memory and takes lengthy processing time. Numericware i addresses these challenges with 2 algorithmic methods: multithreading and forward chopping. The multithreading allows computational routines to concurrently run on multiple central processing unit (CPU) processors. The forward chopping addresses memory limitation by dividing a dataset into appropriately sized subsets. Numericware i allows calculation of the IBS matrix for a large genotypic dataset using a laptop or a desktop computer. For comparison with different software, we calculated genetic relationship matrices using Numericware i, SPAGeDi, and TASSEL with the same genotypic dataset. Numericware i calculates IBS coefficients between 0 and 2, whereas SPAGeDi and TASSEL produce different ranges of values including negative values. The Pearson correlation coefficient between the matrices from Numericware i and TASSEL was high at .9972, whereas SPAGeDi showed low correlation with Numericware i (.0505) and TASSEL (.0587). With a high-dimensional dataset of 500 entities by 10 000 000 SNPs, Numericware i spent 382 minutes using 19 CPU threads and 64 GB memory by dividing the dataset into 3 pieces, whereas SPAGeDi and TASSEL failed with the same dataset. Numericware i is freely available for Windows and Linux under CC-BY 4.0 license at https://figshare.com/s/f100f33a8857131eb2db.
我们引入了Numericware i软件,用于根据基因型数据计算状态相同(IBS)矩阵。使用大型数据集计算IBS矩阵需要大量计算机内存,且处理时间较长。Numericware i通过两种算法方法应对这些挑战:多线程和前向分割。多线程允许计算例程在多个中央处理器(CPU)上并发运行。前向分割通过将数据集划分为大小合适的子集来解决内存限制问题。Numericware i允许使用笔记本电脑或台式计算机为大型基因型数据集计算IBS矩阵。为了与不同软件进行比较,我们使用相同的基因型数据集,通过Numericware i、SPAGeDi和TASSEL计算了遗传关系矩阵。Numericware i计算的IBS系数在0到2之间,而SPAGeDi和TASSEL产生的取值范围不同,包括负值。Numericware i和TASSEL生成的矩阵之间的Pearson相关系数很高,为0.9972,而SPAGeDi与Numericware i的相关性较低(0.0505),与TASSEL的相关性也较低(0.0587)。对于一个包含500个实体和1000万个单核苷酸多态性(SNP)的高维数据集,Numericware i通过将数据集分成3块,使用19个CPU线程和64GB内存花费了382分钟,而SPAGeDi和TASSEL在相同数据集上运行失败。Numericware i根据CC-BY 4.0许可在https://figshare.com/s/f100f33a8857131eb2db上可免费用于Windows和Linux系统。