• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用全基因组测序的牛系谱对相位软件进行基准测试。

Benchmarking phasing software with a whole-genome sequenced cattle pedigree.

机构信息

Unit of Animal Genomics, GIGA-R and Faculty of Veterinary Medicine, University of Liège (B34), 4000, Liège, Belgium.

Animal Genomics, ETH Zürich, 8092, Zürich, Switzerland.

出版信息

BMC Genomics. 2022 Feb 15;23(1):130. doi: 10.1186/s12864-022-08354-6.

DOI:10.1186/s12864-022-08354-6
PMID:35164677
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8845340/
Abstract

BACKGROUND

Accurate haplotype reconstruction is required in many applications in quantitative and population genomics. Different phasing methods are available but their accuracy must be evaluated for samples with different properties (population structure, marker density, etc.). We herein took advantage of whole-genome sequence data available for a Holstein cattle pedigree containing 264 individuals, including 98 trios, to evaluate several population-based phasing methods. This data represents a typical example of a livestock population, with low effective population size, high levels of relatedness and long-range linkage disequilibrium.

RESULTS

After stringent filtering of our sequence data, we evaluated several population-based phasing programs including one or more versions of AlphaPhase, ShapeIT, Beagle, Eagle and FImpute. To that end we used 98 individuals having both parents sequenced for validation. Their haplotypes reconstructed based on Mendelian segregation rules were considered the gold standard to assess the performance of population-based methods in two scenarios. In the first one, only these 98 individuals were phased, while in the second one, all the 264 sequenced individuals were phased simultaneously, ignoring the pedigree relationships. We assessed phasing accuracy based on switch error counts (SEC) and rates (SER), lengths of correctly phased haplotypes and the probability that there is no phasing error between a pair of SNPs as a function of their distance. For most evaluated metrics or scenarios, the best software was either ShapeIT4.1 or Beagle5.2, both methods resulting in particularly high phasing accuracies. For instance, ShapeIT4.1 achieved a median SEC of 50 per individual and a mean haplotype block length of 24.1 Mb (scenario 2). These statistics are remarkable since the methods were evaluated with a map of 8,400,000 SNPs, and this corresponds to only one switch error every 40,000 phased informative markers. When more relatives were included in the data (scenario 2), FImpute3.0 reconstructed extremely long segments without errors.

CONCLUSIONS

We report extremely high phasing accuracies in a typical livestock sample. ShapeIT4.1 and Beagle5.2 proved to be the most accurate, particularly for phasing long segments and in the first scenario. Nevertheless, most tools achieved high accuracy at short distances and would be suitable for applications requiring only local haplotypes.

摘要

背景

在定量和群体基因组学的许多应用中,需要准确的单倍型重建。有不同的相位方法,但必须针对具有不同属性(群体结构、标记密度等)的样本评估其准确性。我们在此利用了一个荷斯坦牛系谱的全基因组序列数据,该系谱包含 264 个个体,包括 98 个三亲家庭,用于评估几种基于群体的相位方法。该数据代表了一个典型的家畜群体,有效群体规模小,亲缘关系高,长程连锁不平衡。

结果

在对我们的序列数据进行严格过滤后,我们评估了几种基于群体的相位程序,包括一个或多个版本的 AlphaPhase、ShapeIT、Beagle、Eagle 和 FImpute。为此,我们使用了 98 个具有双亲测序的个体进行验证。根据孟德尔分离规则重建的单倍型被认为是评估基于群体的方法在两种情况下表现的金标准。在第一种情况下,只有这 98 个个体被相位化,而在第二种情况下,264 个测序的个体同时被相位化,忽略了系谱关系。我们根据转换错误计数(SEC)和速率(SER)、正确相位化单倍型的长度以及一对 SNP 之间是否存在相位错误的概率来评估相位精度,作为其距离的函数。对于大多数评估指标或场景,最好的软件是 ShapeIT4.1 或 Beagle5.2,这两种方法都产生了特别高的相位精度。例如,ShapeIT4.1 每个个体的中位数 SEC 为 50,平均单倍型块长度为 24.1 Mb(场景 2)。这些统计数据非常显著,因为方法是在一个包含 840 万个 SNP 的图谱上进行评估的,这相当于每 40,000 个相位化的信息标记只有一个转换错误。当更多的亲属被纳入数据(场景 2)时,FImpute3.0 重建了没有错误的极其长的片段。

结论

我们在一个典型的家畜样本中报告了极高的相位精度。ShapeIT4.1 和 Beagle5.2 被证明是最准确的,特别是对于长片段的相位化和第一种情况。然而,大多数工具在短距离上都达到了很高的准确性,并且适用于仅需要局部单倍型的应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4059/8845340/aab2e2333c6b/12864_2022_8354_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4059/8845340/83219583a6e1/12864_2022_8354_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4059/8845340/0c291f0b1b7e/12864_2022_8354_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4059/8845340/43a4382e6402/12864_2022_8354_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4059/8845340/5e9acab29a7e/12864_2022_8354_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4059/8845340/aab2e2333c6b/12864_2022_8354_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4059/8845340/83219583a6e1/12864_2022_8354_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4059/8845340/0c291f0b1b7e/12864_2022_8354_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4059/8845340/43a4382e6402/12864_2022_8354_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4059/8845340/5e9acab29a7e/12864_2022_8354_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4059/8845340/aab2e2333c6b/12864_2022_8354_Fig5_HTML.jpg

相似文献

1
Benchmarking phasing software with a whole-genome sequenced cattle pedigree.利用全基因组测序的牛系谱对相位软件进行基准测试。
BMC Genomics. 2022 Feb 15;23(1):130. doi: 10.1186/s12864-022-08354-6.
2
A comparison of different algorithms for phasing haplotypes using Holstein cattle genotypes and pedigree data.使用荷斯坦奶牛基因型和系谱数据对不同单倍型定相算法的比较。
J Dairy Sci. 2017 Apr;100(4):2837-2849. doi: 10.3168/jds.2016-11590. Epub 2017 Feb 1.
3
Phasing quality assessment in a brown layer population through family- and population-based software.通过基于家系和群体的软件对棕色层群体进行分相质量评估。
BMC Genet. 2019 Jul 17;20(1):57. doi: 10.1186/s12863-019-0759-3.
4
A strategy to improve phasing of whole-genome sequenced individuals through integration of familial information from dense genotype panels.一种通过整合来自密集基因型面板的家族信息来改善全基因组测序个体相位的策略。
Genet Sel Evol. 2017 May 16;49(1):46. doi: 10.1186/s12711-017-0321-6.
5
trioPhaser: using Mendelian inheritance logic to improve genomic phasing of trios.trioPhaser:利用孟德尔遗传逻辑提高三体型的基因组相位。
BMC Bioinformatics. 2021 Nov 22;22(1):559. doi: 10.1186/s12859-021-04470-4.
6
Extending long-range phasing and haplotype library imputation algorithms to large and heterogeneous datasets.将长程相位推断和单倍型文库推断算法扩展到大型和异质数据集。
Genet Sel Evol. 2020 Jul 8;52(1):38. doi: 10.1186/s12711-020-00558-2.
7
Recombination locations and rates in beef cattle assessed from parent-offspring pairs.通过亲子对评估肉牛的重组位置和速率。
Genet Sel Evol. 2014 May 29;46(1):34. doi: 10.1186/1297-9686-46-34.
8
Strategies for imputation to whole genome sequence using a single or multi-breed reference population in cattle.利用单一或多品种参考群体对牛全基因组序列进行填充的策略。
BMC Genomics. 2014 Aug 27;15(1):728. doi: 10.1186/1471-2164-15-728.
9
Comparison of phasing strategies for whole human genomes.全人类基因组相位策略比较。
PLoS Genet. 2018 Apr 5;14(4):e1007308. doi: 10.1371/journal.pgen.1007308. eCollection 2018 Apr.
10
A combined long-range phasing and long haplotype imputation method to impute phase for SNP genotypes.一种结合长程相位和长单倍型推断方法的 SNP 基因型相位推断。
Genet Sel Evol. 2011 Mar 10;43(1):12. doi: 10.1186/1297-9686-43-12.

引用本文的文献

1
Global and local ancestry estimation in a captive baboon colony.在一个圈养狒狒群体中进行全球和局部祖先估计。
PLoS One. 2024 Jul 3;19(7):e0305157. doi: 10.1371/journal.pone.0305157. eCollection 2024.
2
The shared ancestry between the C9orf72 hexanucleotide repeat expansion and intermediate-length alleles using haplotype sharing trees and HAPTK.利用单倍型共享树和 HAPTK 研究 C9orf72 六核苷酸重复扩展与中等长度等位基因的共享祖先。
Am J Hum Genet. 2024 Feb 1;111(2):383-392. doi: 10.1016/j.ajhg.2023.12.019. Epub 2024 Jan 18.
3
An organism-wide ATAC-seq peak catalog for the bovine and its use to identify regulatory variants.

本文引用的文献

1
Fast two-stage phasing of large-scale sequence data.大规模序列数据的快速两阶段相位测定。
Am J Hum Genet. 2021 Oct 7;108(10):1880-1890. doi: 10.1016/j.ajhg.2021.08.005. Epub 2021 Sep 2.
2
A 12 kb multi-allelic copy number variation encompassing a GC gene enhancer is associated with mastitis resistance in dairy cattle.一个包含 GC 基因增强子的 12kb 多等位基因拷贝数变异与奶牛乳腺炎抗性有关。
PLoS Genet. 2021 Jul 21;17(7):e1009331. doi: 10.1371/journal.pgen.1009331. eCollection 2021 Jul.
3
Male recombination map of the autosomal genome in German Holstein.
牛的全基因组 ATAC-seq 峰目录及其用于鉴定调控变体的应用
Genome Res. 2023 Oct;33(10):1848-1864. doi: 10.1101/gr.277947.123. Epub 2023 Sep 26.
4
Evaluation of Whole-Genome Sequence Imputation Strategies in Korean Hanwoo Cattle.韩牛全基因组序列填充策略的评估
Animals (Basel). 2022 Sep 1;12(17):2265. doi: 10.3390/ani12172265.
德国荷斯坦牛常染色体基因组的雄性重组图谱。
Genet Sel Evol. 2020 Dec 14;52(1):73. doi: 10.1186/s12711-020-00593-z.
4
Extending long-range phasing and haplotype library imputation algorithms to large and heterogeneous datasets.将长程相位推断和单倍型文库推断算法扩展到大型和异质数据集。
Genet Sel Evol. 2020 Jul 8;52(1):38. doi: 10.1186/s12711-020-00558-2.
5
De novo assembly of the cattle reference genome with single-molecule sequencing.利用单分子测序技术从头组装牛参考基因组。
Gigascience. 2020 Mar 1;9(3). doi: 10.1093/gigascience/giaa021.
6
Dating genomic variants and shared ancestry in population-scale sequencing data.在大规模测序数据中追溯基因组变异和共同祖先。
PLoS Biol. 2020 Jan 17;18(1):e3000586. doi: 10.1371/journal.pbio.3000586. eCollection 2020 Jan.
7
Accurate, scalable and integrative haplotype estimation.精确、可扩展且综合的单倍型估计。
Nat Commun. 2019 Nov 28;10(1):5436. doi: 10.1038/s41467-019-13225-y.
8
A method for genome-wide genealogy estimation for thousands of samples.一种用于对数千个样本进行全基因组谱系估计的方法。
Nat Genet. 2019 Sep;51(9):1321-1329. doi: 10.1038/s41588-019-0484-x. Epub 2019 Sep 2.
9
Phasing quality assessment in a brown layer population through family- and population-based software.通过基于家系和群体的软件对棕色层群体进行分相质量评估。
BMC Genet. 2019 Jul 17;20(1):57. doi: 10.1186/s12863-019-0759-3.
10
A One-Penny Imputed Genome from Next-Generation Reference Panels.基于新一代参考面板的单分钱估算基因组。
Am J Hum Genet. 2018 Sep 6;103(3):338-348. doi: 10.1016/j.ajhg.2018.07.015. Epub 2018 Aug 9.