Suppr超能文献

一个新的参考基因组提高了用于特定品种 GWAS 的变异分辨率。

A novel reference genome improves variant resolution for use in breed-specific GWAS.

机构信息

Asymmetric Operations Sector, The Johns Hopkins University Applied Physics Laboratory, Laurel, MD, USA.

McKusick-Nathans Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA.

出版信息

Life Sci Alliance. 2021 Jan 29;4(4). doi: 10.26508/lsa.202000902. Print 2021 Apr.

Abstract

Reference genome fidelity is critically important for genome wide association studies, yet most vary widely from the study population. A typical whole genome sequencing approach implies short-read technologies resulting in fragmented assemblies with regions of ambiguity. Further information is lost by economic necessity when genotyping populations, as lower resolution technologies such as genotyping arrays are commonly used. Here, we present a phased reference genome for using high molecular weight DNA-sequencing technologies. We tested wet laboratory and bioinformatic approaches to demonstrate a minimum workflow to generate the 2.4 gigabase genome for a Labrador Retriever. The de novo assembly required eight Oxford Nanopore R9.4 flowcells (∼23X depth) and running a 10X Genomics library on the equivalent of one lane of an Illumina NovaSeq S1 flowcell (∼88X depth), bringing the cost of generating a nearly complete reference genome to less than $10K (USD). Mapping of short-read data from 10 Labrador Retrievers against this reference resulted in 1% more aligned reads versus the current reference (CanFam3.1, < 0.001), and a 15% reduction of variant calls, increasing the chance of identifying true, low-effect size variants in a genome-wide association studies. We believe that by incorporating the cost to produce a full genome assembly into any large-scale genotyping project, an investigator can improve study power, decrease costs, and optimize the overall scientific value of their study.

摘要

参考基因组的准确性对于全基因组关联研究至关重要,但大多数参考基因组与研究人群差异很大。典型的全基因组测序方法意味着使用短读长技术,导致组装结果碎片化,并存在模糊区域。由于经济上的需要,在对人群进行基因分型时会丢失更多信息,因为通常使用分辨率较低的技术,如基因分型芯片。在这里,我们为 提供了一个基于高通量 DNA 测序技术的相位参考基因组。我们测试了实验室和生物信息学方法,以展示生成 Labrador Retriever 2.4 千兆碱基基因组的最小工作流程。从头组装需要八块 Oxford Nanopore R9.4 流动池(约 23X 深度),并在相当于 Illumina NovaSeq S1 流动池一个通道的 10X Genomics 文库上运行(约 88X 深度),将生成近乎完整参考基因组的成本降低到 10K 美元以下(美元)。将 10 只 Labrador Retriever 的短读数据与该参考基因组进行比对,与当前参考基因组(CanFam3.1, < 0.001)相比,对齐读取的数量增加了 1%,变体调用减少了 15%,从而增加了在全基因组关联研究中识别真正低效应大小变体的机会。我们相信,通过将生成完整基因组组装的成本纳入任何大规模基因分型项目中,研究人员可以提高研究能力,降低成本,并优化研究的整体科学价值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/384b/7898556/151c9a5372f7/LSA-2020-00902_Fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验