Suppr超能文献

对数百万个样本进行定相可实现近乎完美的准确性,从而能够进行起源亲本分析。

Phasing millions of samples achieves near perfect accuracy, enabling parent-of-origin analyses.

作者信息

Williams Cole M, O'Connell Jared, Jewett Ethan, Freyman William A, Gignoux Christopher R, Ramachandran Sohini, Williams Amy L

机构信息

Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA; Department of Ecology, Evolution, and Organismal Biology, Brown University, Providence, RI 02912, USA.

23andMe, Inc., Sunnyvale, CA 94086, USA.

出版信息

HGG Adv. 2025 Jul 22;6(4):100479. doi: 10.1016/j.xhgg.2025.100479.

Abstract

Haplotype phasing, the process of determining which genetic variants are physically located on the same chromosome, is crucial for genetic analyses. Here, we benchmark SHAPEIT and Beagle, two state-of-the-art phasing methods, on two large datasets: >8 million research-consented 23andMe, Inc. customers and the UK Biobank (UKB). Remarkably, both methods' median switch error rate (SER) (after excluding single SNP switches, which we call "blips") is 0.00% across all tested 23andMe trio children and 0.026% in British samples from UKB. Across UKB samples, switch errors predominantly occur in regions lacking identity-by-descent (IBD) coverage. SHAPEIT and Beagle excel at intra-chromosomal phasing, but lack the ability to phase across chromosomes, motivating us to develop HAPTiC (HAPlotype Tiling and Clustering), an inter-chromosomal phasing method that assigns paternal and maternal variants genome-wide. Our approach uses IBD segments to phase blocks of variants on different chromosomes. HAPTiC represents the segments a focal individual shares with their relatives as nodes in a signed graph and performs spectral clustering. We test HAPTiC on 1,022 UKB trios, yielding a median per-site phase error of 0.13% in regions covered by IBD segments (45.1% of sites). We also ran HAPTiC in the 23andMe database and found a median phase error rate of 0.49% in Europeans (100% of sites) and 0.16% in admixed Africans (99.8% of sites). HAPTiC enables analyses that require the parent-of-origin of variants, such as association studies and ancestry inference of untyped parents.

摘要

单倍型定相,即确定哪些基因变异实际位于同一条染色体上的过程,对于基因分析至关重要。在此,我们在两个大型数据集上对两种最先进的定相方法SHAPEIT和Beagle进行了基准测试:超过800万同意参与研究的23andMe公司客户以及英国生物银行(UKB)。值得注意的是,在所有测试的23andMe三人组儿童中,这两种方法的中位切换错误率(SER)(排除我们称为“尖峰”的单核苷酸多态性(SNP)切换后)均为0.00%,在UKB的英国样本中为0.026%。在UKB样本中,切换错误主要发生在缺乏同源性(IBD)覆盖的区域。SHAPEIT和Beagle在染色体内定相方面表现出色,但缺乏跨染色体定相的能力,这促使我们开发了HAPTiC(单倍型平铺和聚类),一种全基因组分配父本和母本变异的跨染色体定相方法。我们的方法使用IBD片段对不同染色体上的变异块进行定相。HAPTiC将一个焦点个体与其亲属共享的片段表示为有符号图中的节点,并执行谱聚类。我们在1022个UKB三人组上测试了HAPTiC,在IBD片段覆盖的区域(占位点的45.1%)中,每个位点的中位定相错误率为0.13%。我们还在23andMe数据库中运行了HAPTiC,发现在欧洲人(占位点的100%)中的中位定相错误率为0.49%,在混血非洲人(占位点的99.8%)中为0.16%。HAPTiC能够进行需要变异起源亲本信息的分析,例如关联研究和未分型亲本的祖先推断。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验