Department of Computer Science, University of Central Florida, Orlando, FL, 32816, USA.
USF Genomics, College of Public Health, University of South Florida, Tampa, FL, 33612, USA.
Genome Biol. 2019 Jul 25;20(1):143. doi: 10.1186/s13059-019-1754-8.
While genetic relatedness, usually manifested as segments identical by descent (IBD), is ubiquitous in modern large biobanks, current IBD detection methods are not efficient at such a scale. Here, we describe an efficient method, RaPID, for detecting IBD segments in a panel with phased haplotypes. RaPID achieves a time and space complexity linear to the input size and the number of reported IBDs. With simulation, we showed that RaPID is orders of magnitude faster than existing method while offering competitive power and accuracy. In UK Biobank, RaPID identified 3,335,807 IBDs with a lenght ≥ 10 cM among 223,507 male X chromosomes in 11 min.
虽然遗传关联性,通常表现为同源片段(IBD),在现代大型生物库中无处不在,但当前的 IBD 检测方法在这种规模下效率不高。在这里,我们描述了一种高效的方法 RaPID,用于检测具有相位单倍型的面板中的 IBD 片段。RaPID 的时间和空间复杂度与输入大小和报告的 IBD 数量呈线性关系。通过模拟,我们表明 RaPID 的速度比现有方法快几个数量级,同时提供有竞争力的功率和准确性。在 UK Biobank 中,RaPID 在 11 分钟内从 223,507 个男性 X 染色体中识别出 3,335,807 个长度≥10 cM 的 IBD。