Suppr超能文献

牛种特异性增强参考图谱有助于准确的序列读取映射和无偏的变异发现。

Bovine breed-specific augmented reference graphs facilitate accurate sequence read mapping and unbiased variant discovery.

机构信息

Animal Genomics, ETH Zürich, Zürich, Switzerland.

出版信息

Genome Biol. 2020 Jul 27;21(1):184. doi: 10.1186/s13059-020-02105-0.

Abstract

BACKGROUND

The current bovine genomic reference sequence was assembled from a Hereford cow. The resulting linear assembly lacks diversity because it does not contain allelic variation, a drawback of linear references that causes reference allele bias. High nucleotide diversity and the separation of individuals by hundreds of breeds make cattle ideally suited to investigate the optimal composition of variation-aware references.

RESULTS

We augment the bovine linear reference sequence (ARS-UCD1.2) with variants filtered for allele frequency in dairy (Brown Swiss, Holstein) and dual-purpose (Fleckvieh, Original Braunvieh) cattle breeds to construct either breed-specific or pan-genome reference graphs using the vg toolkit. We find that read mapping is more accurate to variation-aware than linear references if pre-selected variants are used to construct the genome graphs. Graphs that contain random variants do not improve read mapping over the linear reference sequence. Breed-specific augmented and pan-genome graphs enable almost similar mapping accuracy improvements over the linear reference. We construct a whole-genome graph that contains the Hereford-based reference sequence and 14 million alleles that have alternate allele frequency greater than 0.03 in the Brown Swiss cattle breed. Our novel variation-aware reference facilitates accurate read mapping and unbiased sequence variant genotyping for SNPs and Indels.

CONCLUSIONS

We develop the first variation-aware reference graph for an agricultural animal ( https://doi.org/10.5281/zenodo.3759712 ). Our novel reference structure improves sequence read mapping and variant genotyping over the linear reference. Our work is a first step towards the transition from linear to variation-aware reference structures in species with high genetic diversity and many sub-populations.

摘要

背景

目前的牛基因组参考序列是由一头赫里福德牛组装而成的。由此产生的线性组装缺乏多样性,因为它不包含等位基因变异,这是线性参考的一个缺点,会导致参考等位基因偏倚。高核苷酸多样性和个体之间的数百个品种的分离使得牛非常适合研究具有变异意识的参考序列的最佳组成。

结果

我们使用 vg 工具包,用在乳用(棕色瑞士牛、荷斯坦牛)和兼用(弗莱维赫牛、原始勃艮第牛)牛品种中过滤等位基因频率的变体来扩充牛的线性参考序列(ARS-UCD1.2),构建特定品种或泛基因组参考图谱。我们发现,如果使用预先选择的变体来构建基因组图谱,那么读映射比线性参考序列更准确地反映变异。包含随机变体的图谱并不能提高线性参考序列的读映射准确性。特定品种的扩充和泛基因组图谱几乎可以提高线性参考序列的映射准确性。我们构建了一个包含赫里福德牛参考序列和 1400 万个等位基因的全基因组图谱,这些等位基因在棕色瑞士牛品种中的等位基因频率大于 0.03。我们的新变异感知参考有助于 SNP 和 Indel 的准确读映射和无偏序列变异基因分型。

结论

我们为农业动物开发了第一个具有变异意识的参考图谱(https://doi.org/10.5281/zenodo.3759712)。我们的新参考结构提高了序列读映射和变体基因分型的准确性,优于线性参考。我们的工作是朝着在具有高度遗传多样性和许多亚群体的物种中从线性参考结构向具有变异意识的参考结构过渡的第一步。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fff6/7385871/bf214671b2d5/13059_2020_2105_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验