• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

将长程相位推断和单倍型文库推断算法扩展到大型和异质数据集。

Extending long-range phasing and haplotype library imputation algorithms to large and heterogeneous datasets.

机构信息

The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK.

出版信息

Genet Sel Evol. 2020 Jul 8;52(1):38. doi: 10.1186/s12711-020-00558-2.

DOI:10.1186/s12711-020-00558-2
PMID:32640985
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7346379/
Abstract

BACKGROUND

We describe the latest improvements to the long-range phasing (LRP) and haplotype library imputation (HLI) algorithms for successful phasing of both datasets with one million individuals and datasets genotyped using different sets of single nucleotide polymorphisms (SNPs). Previous publicly available implementations of the LRP algorithm implemented in AlphaPhase could not phase large datasets due to the computational cost of defining surrogate parents by exhaustive all-against-all searches. Furthermore, the AlphaPhase implementations of LRP and HLI were not designed to deal with large amounts of missing data that are inherent when using multiple SNP arrays.

METHODS

We developed methods that avoid the need for all-against-all searches by performing LRP on subsets of individuals and then concatenating the results. We also extended LRP and HLI algorithms to enable the use of different sets of markers, including missing values, when determining surrogate parents and identifying haplotypes. We implemented and tested these extensions in an updated version of AlphaPhase, and compared its performance to the software package Eagle2.

RESULTS

A simulated dataset with one million individuals genotyped with the same 6711 SNPs for a single chromosome took less than a day to phase, compared to more than seven days for Eagle2. The percentage of correctly phased alleles at heterozygous loci was 90.2 and 99.9% for AlphaPhase and Eagle2, respectively. A larger dataset with one million individuals genotyped with 49,579 SNPs for a single chromosome took AlphaPhase 23 days to phase, with 89.9% of alleles at heterozygous loci phased correctly. The phasing accuracy was generally lower for datasets with different sets of markers than with one set of markers. For a simulated dataset with three sets of markers, 1.5% of alleles at heterozygous positions were phased incorrectly, compared to 0.4% with one set of markers.

CONCLUSIONS

The improved LRP and HLI algorithms enable AlphaPhase to quickly and accurately phase very large and heterogeneous datasets. AlphaPhase is an order of magnitude faster than the other tested packages, although Eagle2 showed a higher level of phasing accuracy. The speed gain will make phasing achievable for very large genomic datasets in livestock, enabling more powerful breeding and genetics research and application.

摘要

背景

我们描述了长程相位(LRP)和单倍型库内插(HLI)算法的最新改进,这些改进可成功对包含一百万人的两个数据集和使用不同单核苷酸多态性(SNP)集进行基因分型的数据集进行相位分析。由于通过穷尽的全对全搜索来定义替代父母的计算成本,先前可公开获得的 AlphaPhase 中实现的 LRP 算法无法对大型数据集进行相位分析。此外,AlphaPhase 中实现的 LRP 和 HLI 并未设计用于处理使用多个 SNP 阵列时固有的大量缺失数据。

方法

我们开发了一些方法,通过对个体的子集进行 LRP 并串联结果来避免全对全搜索的需要。我们还扩展了 LRP 和 HLI 算法,以允许在确定替代父母和识别单倍型时使用不同的标记集,包括缺失值。我们在 AlphaPhase 的更新版本中实现并测试了这些扩展,并将其性能与 Eagle2 软件包进行了比较。

结果

对于一条染色体上用相同的 6711 个 SNP 对一百万人进行基因分型的模拟数据集,其相位分析不到一天即可完成,而 Eagle2 则需要超过七天。在杂合位置,AlphaPhase 和 Eagle2 正确相位的等位基因百分比分别为 90.2%和 99.9%。对于一条染色体上用 49579 个 SNP 对一百万人进行基因分型的更大数据集,AlphaPhase 需要 23 天进行相位分析,杂合位置的 89.9%等位基因相位正确。与具有一组标记的数据集相比,具有不同标记集的数据集的相位精度通常较低。对于具有三个标记集的模拟数据集,在杂合位置,1.5%的等位基因相位错误,而在具有一组标记的情况下,0.4%的等位基因相位错误。

结论

改进的 LRP 和 HLI 算法使 AlphaPhase 能够快速准确地对非常大且异构的数据集进行相位分析。与其他测试包相比,AlphaPhase 的速度快了一个数量级,尽管 Eagle2 显示出更高的相位分析准确性。速度的提高将使畜牧业中非常大的基因组数据集的相位分析成为可能,从而实现更强大的育种和遗传学研究和应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74e9/7346379/35fd4788ae09/12711_2020_558_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74e9/7346379/51aaba982cb6/12711_2020_558_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74e9/7346379/e1005760c675/12711_2020_558_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74e9/7346379/fcc5987973ec/12711_2020_558_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74e9/7346379/c1a6e5ba5a54/12711_2020_558_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74e9/7346379/142ae0f55e09/12711_2020_558_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74e9/7346379/35fd4788ae09/12711_2020_558_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74e9/7346379/51aaba982cb6/12711_2020_558_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74e9/7346379/e1005760c675/12711_2020_558_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74e9/7346379/fcc5987973ec/12711_2020_558_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74e9/7346379/c1a6e5ba5a54/12711_2020_558_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74e9/7346379/142ae0f55e09/12711_2020_558_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74e9/7346379/35fd4788ae09/12711_2020_558_Fig6_HTML.jpg

相似文献

1
Extending long-range phasing and haplotype library imputation algorithms to large and heterogeneous datasets.将长程相位推断和单倍型文库推断算法扩展到大型和异质数据集。
Genet Sel Evol. 2020 Jul 8;52(1):38. doi: 10.1186/s12711-020-00558-2.
2
A combined long-range phasing and long haplotype imputation method to impute phase for SNP genotypes.一种结合长程相位和长单倍型推断方法的 SNP 基因型相位推断。
Genet Sel Evol. 2011 Mar 10;43(1):12. doi: 10.1186/1297-9686-43-12.
3
Detection of recombination events, haplotype reconstruction and imputation of sires using half-sib SNP genotypes.利用半同胞 SNP 基因型检测重组事件、单倍型重建和估算父本。
Genet Sel Evol. 2014 Feb 4;46(1):11. doi: 10.1186/1297-9686-46-11.
4
EagleImp: fast and accurate genome-wide phasing and imputation in a single tool.EagleImp:在单个工具中实现快速准确的全基因组定相和基因分型。
Bioinformatics. 2022 Nov 15;38(22):4999-5006. doi: 10.1093/bioinformatics/btac637.
5
Benchmarking phasing software with a whole-genome sequenced cattle pedigree.利用全基因组测序的牛系谱对相位软件进行基准测试。
BMC Genomics. 2022 Feb 15;23(1):130. doi: 10.1186/s12864-022-08354-6.
6
A strategy to improve phasing of whole-genome sequenced individuals through integration of familial information from dense genotype panels.一种通过整合来自密集基因型面板的家族信息来改善全基因组测序个体相位的策略。
Genet Sel Evol. 2017 May 16;49(1):46. doi: 10.1186/s12711-017-0321-6.
7
Phasing quality assessment in a brown layer population through family- and population-based software.通过基于家系和群体的软件对棕色层群体进行分相质量评估。
BMC Genet. 2019 Jul 17;20(1):57. doi: 10.1186/s12863-019-0759-3.
8
A haplotype inference algorithm for trios based on deterministic sampling.基于确定性采样的三体型单倍型推断算法。
BMC Genet. 2010 Aug 23;11:78. doi: 10.1186/1471-2156-11-78.
9
Imputation of missing genotypes from sparse to high density using long-range phasing.利用长程定相对稀疏至高密度缺失基因型进行推断。
Genetics. 2011 Sep;189(1):317-27. doi: 10.1534/genetics.111.128082. Epub 2011 Jul 29.
10
A phasing and imputation method for pedigreed populations that results in a single-stage genomic evaluation.一种针对有家系信息的群体的分相和导入方法,可实现单阶段基因组评估。
Genet Sel Evol. 2012 Jun 19;44(1):9. doi: 10.1186/1297-9686-44-9.

引用本文的文献

1
Benchmarking phasing software with a whole-genome sequenced cattle pedigree.利用全基因组测序的牛系谱对相位软件进行基准测试。
BMC Genomics. 2022 Feb 15;23(1):130. doi: 10.1186/s12864-022-08354-6.

本文引用的文献

1
Assessment of the performance of hidden Markov models for imputation in animal breeding.评估隐马尔可夫模型在动物育种中插补的性能。
Genet Sel Evol. 2018 Sep 17;50(1):44. doi: 10.1186/s12711-018-0416-8.
2
SNP Data Quality Control in a National Beef and Dairy Cattle System and Highly Accurate SNP Based Parentage Verification and Identification.国家肉牛和奶牛系统中的单核苷酸多态性(SNP)数据质量控制以及基于SNP的高度准确的亲子关系验证和识别
Front Genet. 2018 Mar 15;9:84. doi: 10.3389/fgene.2018.00084. eCollection 2018.
3
10 Years of GWAS Discovery: Biology, Function, and Translation.
全基因组关联研究十年发现:生物学、功能与转化
Am J Hum Genet. 2017 Jul 6;101(1):5-22. doi: 10.1016/j.ajhg.2017.06.005.
4
A comparison of different algorithms for phasing haplotypes using Holstein cattle genotypes and pedigree data.使用荷斯坦奶牛基因型和系谱数据对不同单倍型定相算法的比较。
J Dairy Sci. 2017 Apr;100(4):2837-2849. doi: 10.3168/jds.2016-11590. Epub 2017 Feb 1.
5
AlphaSim: Software for Breeding Program Simulation.AlphaSim:种畜培育程序模拟软件。
Plant Genome. 2016 Nov;9(3). doi: 10.3835/plantgenome2016.02.0013.
6
Genomic Selection in Dairy Cattle: The USDA Experience.奶牛基因组选择:美国农业部的经验。
Annu Rev Anim Biosci. 2017 Feb 8;5:309-327. doi: 10.1146/annurev-animal-021815-111422. Epub 2016 Nov 16.
7
Reference-based phasing using the Haplotype Reference Consortium panel.使用单倍型参考联盟面板进行基于参考的定相
Nat Genet. 2016 Nov;48(11):1443-1448. doi: 10.1038/ng.3679. Epub 2016 Oct 3.
8
Haplotype estimation for biobank-scale data sets.生物样本库规模数据集的单倍型估计
Nat Genet. 2016 Jul;48(7):817-20. doi: 10.1038/ng.3583. Epub 2016 Jun 6.
9
Human Complex Trait Genetics in the 21st Century.21世纪的人类复杂性状遗传学。
Genetics. 2016 Feb;202(2):377-9. doi: 10.1534/genetics.115.180513.
10
Potential of genotyping-by-sequencing for genomic selection in livestock populations.家畜群体中基于测序的基因分型用于基因组选择的潜力。
Genet Sel Evol. 2015 Mar 1;47(1):12. doi: 10.1186/s12711-015-0102-z.