Suppr超能文献

使用 RaPID 在大规模生物库研究中进行准确和快速的家族关系推断。

RAFFI: Accurate and fast familial relationship inference in large scale biobank studies using RaPID.

机构信息

School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America.

Department of Computer Science, Rice University, Houston, Texas, United States of America.

出版信息

PLoS Genet. 2021 Jan 21;17(1):e1009315. doi: 10.1371/journal.pgen.1009315. eCollection 2021 Jan.

Abstract

Inference of relationships from whole-genome genetic data of a cohort is a crucial prerequisite for genome-wide association studies. Typically, relationships are inferred by computing the kinship coefficients (ϕ) and the genome-wide probability of zero IBD sharing (π0) among all pairs of individuals. Current leading methods are based on pairwise comparisons, which may not scale up to very large cohorts (e.g., sample size >1 million). Here, we propose an efficient relationship inference method, RAFFI. RAFFI leverages the efficient RaPID method to call IBD segments first, then estimate the ϕ and π0 from detected IBD segments. This inference is achieved by a data-driven approach that adjusts the estimation based on phasing quality and genotyping quality. Using simulations, we showed that RAFFI is robust against phasing/genotyping errors, admix events, and varying marker densities, and achieves higher accuracy compared to KING, the current leading method, especially for more distant relatives. When applied to the phased UK Biobank data with ~500K individuals, RAFFI is approximately 18 times faster than KING. We expect RAFFI will offer fast and accurate relatedness inference for even larger cohorts.

摘要

从队列的全基因组遗传数据中推断关系是全基因组关联研究的关键前提。通常,通过计算所有个体对之间的亲缘系数 (ϕ) 和全基因组零 IBD 共享概率 (π0) 来推断关系。目前领先的方法基于成对比较,这可能不适用于非常大的队列(例如,样本量>100 万)。在这里,我们提出了一种有效的关系推断方法,即 RAFFI。RAFFI 利用高效的 RaPID 方法首先调用 IBD 段,然后从检测到的 IBD 段中估计 ϕ 和 π0。这种推断是通过一种数据驱动的方法实现的,该方法根据相位质量和基因分型质量调整估计。通过模拟,我们表明 RAFFI 对相位/基因分型错误、混合事件和变化的标记密度具有鲁棒性,并且与当前领先的方法 KING 相比,精度更高,尤其是对于更远的亲属。当应用于具有约 500K 个体的分相 UK Biobank 数据时,RAFFI 比 KING 快约 18 倍。我们预计 RAFFI 将为更大的队列提供快速准确的相关性推断。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3cdd/7853505/aff83fa81b38/pgen.1009315.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验