Suppr超能文献

对亲代-子代三体系进行基因型分型和单体型分型。

Genotype calling and haplotyping in parent-offspring trios.

机构信息

Division of Pediatric Pulmonary Medicine, Allergy and Immunology, Department of Pediatrics, Children's Hospital of Pittsburgh of UPMC, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania 15224, USA.

出版信息

Genome Res. 2013 Jan;23(1):142-51. doi: 10.1101/gr.142455.112. Epub 2012 Oct 11.

Abstract

Emerging sequencing technologies allow common and rare variants to be systematically assayed across the human genome in many individuals. In order to improve variant detection and genotype calling, raw sequence data are typically examined across many individuals. Here, we describe a method for genotype calling in settings where sequence data are available for unrelated individuals and parent-offspring trios and show that modeling trio information can greatly increase the accuracy of inferred genotypes and haplotypes, especially on low to modest depth sequencing data. Our method considers both linkage disequilibrium (LD) patterns and the constraints imposed by family structure when assigning individual genotypes and haplotypes. Using simulations, we show that trios provide higher genotype calling accuracy across the frequency spectrum, both overall and at hard-to-call heterozygous sites. In addition, trios provide greatly improved phasing accuracy--improving the accuracy of downstream analyses (such as genotype imputation) that rely on phased haplotypes. To further evaluate our approach, we analyzed data on the first 508 individuals sequenced by the SardiNIA sequencing project. Our results show that our method reduces the genotyping error rate by 50% compared with analysis using existing methods that ignore family structure. We anticipate our method will facilitate genotype calling and haplotype inference for many ongoing sequencing projects.

摘要

新兴的测序技术允许在许多个体中系统地检测整个人类基因组中的常见和罕见变体。为了提高变异检测和基因型调用的准确性,通常会在许多个体中检查原始序列数据。在这里,我们描述了一种在存在无关个体和父母-子女三体型个体的序列数据的情况下进行基因型调用的方法,并表明建模三体型信息可以大大提高推断基因型和单倍型的准确性,特别是在低至适度深度测序数据上。我们的方法在分配个体基因型和单倍型时同时考虑了连锁不平衡 (LD) 模式和家族结构的限制。通过模拟,我们表明三体型在整个频率范围内提供了更高的基因型调用准确性,无论是整体还是在难以调用的杂合子位点。此外,三体型提供了大大提高的相位准确性 - 提高了依赖相位单倍型的下游分析(例如基因型推断)的准确性。为了进一步评估我们的方法,我们分析了 SardiNIA 测序项目中前 508 个个体的测序数据。我们的结果表明,与忽略家族结构的现有方法相比,我们的方法将基因分型错误率降低了 50%。我们预计我们的方法将为许多正在进行的测序项目提供基因型调用和单倍型推断的便利。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e1f3/3530674/80e77b80f4e5/142fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验