荷斯坦牛大型结构变异组库及相关数据库的建立，用于变异的发现、鉴定和应用。

A large structural variant collection in Holstein cattle and associated database for variant discovery, characterization, and application.

机构信息

Agricultural, Food & Nutritional Science, University of Alberta, Edmonton, AB, T6G 2P5, Canada.

Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, ON, Canada.

出版信息

BMC Genomics. 2024 Sep 30;25(1):903. doi: 10.1186/s12864-024-10812-2.

DOI:10.1186/s12864-024-10812-2

PMID:39350025

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11440700/

Abstract

BACKGROUND

Structural variants (SVs) such as deletions, duplications, and insertions are known to contribute to phenotypic variation but remain challenging to identify and genotype. A more complete, accessible, and assessable collection of SVs will assist efforts to study SV function in cattle and to incorporate SV genotyping into animal evaluation.

RESULTS

In this work we produced a large and deeply characterized collection of SVs in Holstein cattle using two popular SV callers (Manta and Smoove) and publicly available Illumina whole-genome sequence (WGS) read sets from 310 samples (290 male, 20 female, mean 20X coverage). Manta and Smoove identified 31 K and 68 K SVs, respectively. In total the SVs cover 5% (Manta) and 6% (Smoove) of the reference genome, in contrast to the 1% impacted by SNPs and indels. SV genotypes from each caller were confirmed to accurately recapitulate animal relationships estimated using WGS SNP genotypes from the same dataset, with Manta genotypes outperforming Smoove, and deletions outperforming duplications. To support efforts to link the SVs to phenotypic variation, overlapping and tag SNPs were identified for each SV, using genotype sets extracted from the WGS results corresponding to two bovine SNP chips (BovineSNP50 and BovineHD). 9% (Manta) and 11% (Smoove) of the SVs were found to have overlapping BovineHD panel SNPs, while 21% (Manta) and 9% (Smoove) have BovineHD panel tag SNPs. A custom interactive database ( https://svdb-dc.pslab.ca ) containing the identified sequence variants with extensive annotations, gene feature information, and BAM file content for all SVs was created to enable the evaluation and prioritization of SVs for further study. Illustrative examples involving the genes POPDC3, ORM1, G2E3, FANCI, TFB1M, FOXC2, N4BP2, GSTA3, and COPA show how this resource can be used to find well-supported genic SVs, determine SV breakpoints, design genotyping approaches, and identify processed pseudogenes masquerading as deletions.

CONCLUSIONS

The resources developed through this study can be used to explore sequence variation in Holstein cattle and to develop strategies for studying SVs of interest. The lack of overlapping and tag SNPs from commonly used SNP chips for most of the SVs suggests that other genotyping approaches will be needed (for example direct genotyping) to understand their potential contributions to phenotype. The included SV genotype assessments point to challenges in characterizing SVs, especially duplications, using short-read data and support ongoing efforts to better characterize cattle genomes through long-read sequencing. Lastly, the identification of previously known functional SVs and additional CDS-overlapping SVs supports the phenotypic relevance of this dataset.

摘要

背景

结构变异（SVs），如缺失、重复和插入，已知会导致表型变异，但仍然难以识别和基因分型。更完整、可访问和可评估的 SV 集合将有助于研究牛中的 SV 功能，并将 SV 基因分型纳入动物评估。

结果

在这项工作中，我们使用两种流行的 SV 调用者（Manta 和 Smoove）和来自 310 个样本（290 个雄性，20 个雌性，平均 20X 覆盖）的公共可用 Illumina 全基因组序列（WGS）读取集，在荷斯坦奶牛中产生了大量深度特征化的 SV 集合。Manta 和 Smoove 分别识别了 31K 和 68K SVs。总的来说，SVs 覆盖参考基因组的 5%（Manta）和 6%（Smoove），而不是 SNP 和插入缺失影响的 1%。来自每个调用者的 SV 基因型被证实准确地再现了使用来自同一数据集的 WGS SNP 基因型估计的动物关系，其中 Manta 基因型优于 Smoove，缺失优于重复。为了支持将 SV 与表型变异联系起来的努力，为每个 SV 确定了重叠和标签 SNP，使用从对应于两个牛 SNP 芯片（BovineSNP50 和 BovineHD）的 WGS 结果中提取的基因型集。在 Manta（9%）和 Smoove（11%）中发现 11%的 SV 具有重叠的 BovineHD 面板 SNP，而 21%（Manta）和 9%（Smoove）具有 BovineHD 面板标签 SNP。创建了一个包含带有广泛注释、基因特征信息和所有 SV 的 BAM 文件内容的识别序列变体的定制交互式数据库（https://svdb-dc.pslab.ca），以支持对 SV 进行评估和优先级排序，以进一步研究。涉及基因 POPDC3、ORM1、G2E3、FANCI、TFB1M、FOXC2、N4BP2、GSTA3 和 COPA 的说明性示例展示了如何使用此资源找到支持良好的基因 SV、确定 SV 断点、设计基因分型方法以及识别伪装成缺失的加工假基因。

结论

通过这项研究开发的资源可用于探索荷斯坦奶牛中的序列变异，并制定研究感兴趣的 SVs 的策略。对于大多数 SVs，来自常用 SNP 芯片的重叠和标签 SNP 的缺乏表明需要其他基因分型方法（例如直接基因分型）来了解它们对表型的潜在贡献。包括的 SV 基因型评估表明，使用短读数据表征 SVs，特别是重复，具有挑战性，并支持通过长读测序更好地表征牛基因组的持续努力。最后，先前已知的功能性 SVs 和其他 CDS 重叠 SVs 的鉴定支持了该数据集的表型相关性。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

荷斯坦牛大型结构变异组库及相关数据库的建立，用于变异的发现、鉴定和应用。

A large structural variant collection in Holstein cattle and associated database for variant discovery, characterization, and application.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

本文引用的文献

荷斯坦牛大型结构变异组库及相关数据库的建立，用于变异的发现、鉴定和应用。

A large structural variant collection in Holstein cattle and associated database for variant discovery, characterization, and application.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

本文引用的文献