Suppr超能文献

利用简化基因组测序(GBS)和填充技术对圈养非人灵长类动物进行全基因组特征分析。

Whole-genome characterization in pedigreed non-human primates using genotyping-by-sequencing (GBS) and imputation.

作者信息

Bimber Benjamin N, Raboin Michael J, Letaw John, Nevonen Kimberly A, Spindel Jennifer E, McCouch Susan R, Cervera-Juanes Rita, Spindel Eliot, Carbone Lucia, Ferguson Betsy, Vinson Amanda

机构信息

Primate Genetics Section, Oregon National Primate Research Center, Beaverton, OR, USA.

Oregon Health & Science University, Portland, OR, USA.

出版信息

BMC Genomics. 2016 Aug 24;17(1):676. doi: 10.1186/s12864-016-2966-x.

Abstract

BACKGROUND

Rhesus macaques are widely used in biomedical research, but the application of genomic information in this species to better understand human disease is still in its infancy. Whole-genome sequence (WGS) data in large pedigreed macaque colonies could provide substantial experimental power for genetic discovery, but the collection of WGS data in large cohorts remains a formidable expense. Here, we describe a cost-effective approach that selects the most informative macaques in a pedigree for 30X WGS, followed by low-cost genotyping-by-sequencing (GBS) at 30X on the remaining macaques in order to generate sparse genotype data at high accuracy. Dense variants from the selected macaques with WGS data are then imputed into macaques having only sparse GBS data, resulting in dense genome-wide genotypes throughout the pedigree.

RESULTS

We developed GBS for the macaque genome using a digestion with PstI, followed by sequencing of size-selected fragments at 30X coverage. From GBS sequence data collected on all individuals in a 16-member pedigree, we characterized high-confidence genotypes at 22,455 single nucleotide variant (SNV) sites that were suitable for guiding imputation of dense sequence data from WGS. To characterize dense markers for imputation, we performed WGS at 30X coverage on nine of the 16 individuals, yielding 10,193,425 high-confidence SNVs. To validate the use of GBS data for facilitating imputation, we initially focused on chromosome 19 as a test case, using an optimized panel of 833 sparse, evenly-spaced markers from GBS and 5,010 dense markers from WGS. Using the method of "Genotype Imputation Given Inheritance" (GIGI), we evaluated the effects on imputation accuracy of 3 different strategies for selecting individuals for WGS, including 1) using "GIGI-Pick" to select the most informative individuals, 2) using the most recent generation, or 3) using founders only.  We also evaluated the effects on imputation accuracy of using a range of from 1 to 9 WGS individuals for imputation. We found that the GIGI-Pick algorithm for selection of WGS individuals outperformed common heuristic approaches, and that genotype numbers and accuracy improved very little when using >5 WGS individuals for imputation. Informed by our findings, we used 4 macaques with WGS data to impute variants at up to 7,655,491 sites spanning all 20 autosomes in the 12 remaining macaques, based on their GBS genotypes at only 17,158 loci. Using a strict confidence threshold, we imputed an average of 3,680,238 variants per individual at >99 % accuracy, or an average 4,458,883 variants per individual at a more relaxed threshold, yielding >97 % accuracy.

CONCLUSIONS

We conclude that an optimal tradeoff between genotype accuracy, number of imputed genotypes, and overall cost exists at the ratio of one individual selected for WGS using the GIGI-Pick algorithm, per 3-5 relatives selected for GBS. This approach makes feasible the collection of accurate, dense genome-wide sequence data in large pedigreed macaque cohorts without the need for more expensive WGS data on all individuals.

摘要

背景

恒河猴广泛应用于生物医学研究,但利用该物种的基因组信息来更好地理解人类疾病仍处于起步阶段。大型谱系猕猴群体中的全基因组序列(WGS)数据可为基因发现提供强大的实验能力,但在大型队列中收集WGS数据仍然成本高昂。在此,我们描述了一种经济高效的方法,即在谱系中选择信息最丰富的猕猴进行30倍覆盖度的WGS测序,然后对其余猕猴进行低成本的30倍测序基因分型(GBS),以高精度生成稀疏基因型数据。然后将来自具有WGS数据的选定猕猴的密集变异位点推算到仅具有稀疏GBS数据的猕猴中,从而在整个谱系中生成密集的全基因组基因型。

结果

我们使用PstI酶切猕猴基因组,随后对大小选择的片段进行30倍覆盖度测序,开发了用于猕猴基因组的GBS方法。从一个16只个体的谱系中所有个体收集的GBS序列数据,我们在22455个单核苷酸变异(SNV)位点鉴定出高可信度基因型,这些位点适合用于指导从WGS推算密集序列数据。为了鉴定用于推算的密集标记,我们对16只个体中的9只进行了30倍覆盖度的WGS测序,产生了10193425个高可信度SNV。为了验证使用GBS数据促进推算的效果,我们最初以19号染色体作为测试案例,使用了一组优化的833个来自GBS的稀疏、均匀间隔的标记和5010个来自WGS的密集标记。使用“给定遗传信息的基因型推算”(GIGI)方法,我们评估了3种不同的选择WGS个体策略对推算准确性的影响,包括:1)使用“GIGI-Pick”选择信息最丰富的个体;2)使用最新一代个体;3)仅使用奠基者个体。我们还评估了使用1至9个进行WGS的个体进行推算对推算准确性的影响。我们发现,用于选择WGS个体的GIGI-Pick算法优于常见的启发式方法,并且当使用超过5个进行WGS的个体进行推算时,基因型数量和准确性提升甚微。基于我们的研究结果,我们利用4只具有WGS数据的猕猴,根据其余12只猕猴仅在17158个位点的GBS基因型,推算出跨越所有20条常染色体的多达7655491个位点的变异。使用严格的置信阈值,我们以>99%的准确性平均为每个个体推算出3680238个变异,或以更宽松的阈值平均为每个个体推算出4458883个变异,准确性>97%。

结论

我们得出结论,在基因型准确性、推算基因型数量和总成本之间存在最佳权衡比例,即使用GIGI-Pick算法选择1个个体进行WGS测序,对应选择3至5个亲属进行GBS测序。这种方法使得在大型谱系猕猴队列中收集准确、密集的全基因组序列数据成为可能,而无需对所有个体进行更昂贵的WGS数据收集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4a8/4997765/890c21403a02/12864_2016_2966_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验