快速、无相位的长同源片段检测可实现有效的关系分类。

Rapid, Phase-free Detection of Long Identity-by-Descent Segments Enables Effective Relationship Classification.

机构信息

Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA.

Department of Genetic Medicine, Weill Cornell Medicine, New York, NY 10065, USA.

出版信息

Am J Hum Genet. 2020 Apr 2;106(4):453-466. doi: 10.1016/j.ajhg.2020.02.012. Epub 2020 Mar 19.

Identity-by-descent (IBD) segments are a useful tool for applications ranging from demographic inference to relationship classification, but most detection methods rely on phasing information and therefore require substantial computation time. As genetic datasets grow, methods for inferring IBD segments that scale well will be critical. We developed IBIS, an IBD detector that locates long regions of allele sharing between unphased individuals, and benchmarked it with Refined IBD, GERMLINE, and TRUFFLE on 3,000 simulated individuals. Phasing these with Beagle 5 takes 4.3 CPU days, followed by either Refined IBD or GERMLINE segment detection in 2.9 or 1.1 h, respectively. By comparison, IBIS finishes in 6.8 min or 7.8 min with IBD2 functionality enabled: speedups of 805-946× including phasing time. TRUFFLE takes 2.6 h, corresponding to IBIS speedups of 20.2-23.3×. IBIS is also accurate, inferring ≥7 cM IBD segments at quality comparable to Refined IBD and GERMLINE. With these segments, IBIS classifies first through third degree relatives in real Mexican American samples at rates meeting or exceeding other methods tested and identifies fourth through sixth degree pairs at rates within 0.0%-2.0% of the top method. While allele frequency-based approaches that do not detect segments can infer relationship degrees faster than IBIS, the fastest are biased in admixed samples, with KING inferring 30.8% fewer fifth degree Mexican American relatives correctly compared with IBIS. Finally, we ran IBIS on chromosome 2 of the UK Biobank dataset and estimate its runtime on the autosomes to be 3.3 days parallelized across 128 cores.

同源片段（IBD）是一种非常有用的工具，可应用于从人口推断到关系分类等各种领域，但大多数检测方法都依赖于相位信息，因此需要大量的计算时间。随着遗传数据集的增长，能够很好地扩展的推断 IBD 片段的方法将是至关重要的。我们开发了 IBIS，这是一种可以定位未相位个体之间等位基因共享的长区域的 IBD 检测器，并在 3000 个模拟个体上与 Refined IBD、GERMLINE 和 TRUFFLE 进行了基准测试。使用 Beagle 5 将这些个体进行相位处理需要 4.3 CPU 天，然后分别使用 Refined IBD 或 GERMLINE 进行片段检测，分别需要 2.9 或 1.1 小时。相比之下，启用 IBD2 功能后，IBIS 在 6.8 分钟或 7.8 分钟内完成：包括相位处理时间在内的加速比为 805-946 倍。TRUFFLE 需要 2.6 小时，对应的 IBIS 加速比为 20.2-23.3 倍。IBIS 也非常准确，在与 Refined IBD 和 GERMLINE 相当的质量下推断出≥7cM 的 IBD 片段。有了这些片段，IBIS 在真实的墨西哥裔美国人样本中，以与其他测试方法相当或更高的速度，对一级到三级亲属进行分类，并以 0.0%-2.0%的顶级方法的范围内的速度，识别四到六级亲属。虽然不检测片段的等位基因频率方法可以比 IBIS 更快地推断出亲属关系程度，但最快的方法在混合样本中存在偏差，与 IBIS 相比，KING 错误地推断出 30.8%的第五级墨西哥裔美国人亲属。最后，我们在 UK Biobank 数据集的 2 号染色体上运行了 IBIS，并估计其在自动染色体上的运行时间为 3.3 天，在 128 个核心上并行化。

Rapid, Phase-free Detection of Long Identity-by-Descent Segments Enables Effective Relationship Classification.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献