Suppr超能文献

校正关系估计中的模型误设

CORRECTING MODEL MISSPECIFICATION IN RELATIONSHIP ESTIMATES.

作者信息

Jewett Ethan M

机构信息

23andMe, Inc. Sunnyvale, CA., 94086.

出版信息

bioRxiv. 2024 Sep 4:2024.05.13.594005. doi: 10.1101/2024.05.13.594005.

Abstract

The datasets of large genotyping biobanks and direct-to-consumer genetic testing companies contain many related individuals. Until now, it has been widely accepted that the most distant relationships that can be detected are around fifteen degrees (approximately 8 cousins) and that practical relationship estimates have a ceiling around ten degrees (approximately 5 cousins). However, we show that these assumptions are incorrect and that they are due to a misapplication of relationship estimators. In particular, relationship estimators are applied almost exclusively to putative relatives who have been identified because they share detectable tracts of DNA identically by descent (IBD). However, no existing relationship estimator conditions on the event that two individuals share at least one detectable segment of IBD anywhere in the genome. As a result, the relationship estimates obtained using existing estimators are dramatically biased for distant relationships, inferring all sufficiently distant relationships to be around ten degrees regardless of the depth of the true relationship. Existing relationship estimators are derived under a model that assumes that each pair of related individuals shares a single common ancestor (or mating pair of ancestors). This model breaks down for relationships beyond 10 generations in the past because individuals share many thousands of cryptic common ancestors due to pedigree collapse. We first derive a corrected likelihood that conditions on the event that at least one segment is observed between a pair of putative relatives and we demonstrate that the corrected likelihood largely eliminates the bias in estimates of pairwise relationships and provides a more accurate characterization of the uncertainty in these estimates. We then reformulate the relationship inference problem to account for the fact that individuals share many common ancestors, not just one. We demonstrate that the most distant relationship that can be inferred using IBD may be 200 degrees or more, rather than ten, extending the time-to-common ancestor from approximately 300 years in the past to approximately 3,000 years in the past or more. This dramatic increase in the range of relationship estimators makes it possible to infer relationships whose common ancestors lived before historical events such as European settlement of the Americas, the Transatlantic Slave Trade, and the rise and fall of the Roman Empire.

摘要

大型基因分型生物样本库和直接面向消费者的基因检测公司的数据集包含许多有亲属关系的个体。到目前为止,人们普遍认为能够检测到的最远亲属关系约为十五度(约8代堂表亲),而实际的亲属关系估计上限约为十度(约5代堂表亲)。然而,我们表明这些假设是不正确的,并且是由于亲属关系估计器的错误应用导致的。具体而言,亲属关系估计器几乎完全应用于那些因通过血缘相同地共享可检测到的DNA片段(IBD)而被识别出的假定亲属。然而,没有现有的亲属关系估计器考虑到两个个体在基因组的任何位置共享至少一个可检测到的IBD片段这一事件。因此,使用现有估计器获得的亲属关系估计对于远亲关系存在极大偏差,无论真实关系的深度如何,都将所有足够远的关系推断为约十度。现有的亲属关系估计器是在一个假设每对相关个体共享一个共同祖先(或一对祖先配偶)的模型下推导出来的。由于系谱崩溃,个体共享数千个隐秘的共同祖先,这个模型在追溯到过去超过10代的关系中就不再适用。我们首先推导了一个校正似然,它基于一对假定亲属之间至少观察到一个片段这一事件,并且我们证明校正似然在很大程度上消除了成对关系估计中的偏差,并为这些估计中的不确定性提供了更准确的描述。然后,我们重新构建亲属关系推断问题,以考虑到个体共享许多共同祖先,而不仅仅是一个。我们证明,使用IBD能够推断出的最远亲属关系可能达到200度或更高,而不是十度,将共同祖先的时间追溯从过去大约300年延长到过去大约3000年或更久。亲属关系估计范围的这种显著扩大使得推断其共同祖先生活在诸如欧洲人定居美洲、跨大西洋奴隶贸易以及罗马帝国兴衰等历史事件之前的亲属关系成为可能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0696/11423048/a8ec38382762/nihpp-2024.05.13.594005v2-f0001.jpg

相似文献

1
CORRECTING MODEL MISSPECIFICATION IN RELATIONSHIP ESTIMATES.
bioRxiv. 2024 Sep 4:2024.05.13.594005. doi: 10.1101/2024.05.13.594005.
2
SIMULATING PEDIGREES ASCERTAINED ON THE BASIS OF OBSERVED IBD SHARING.
bioRxiv. 2024 May 16:2024.05.13.594012. doi: 10.1101/2024.05.13.594012.
3
ancIBD - Screening for identity by descent segments in human ancient DNA.
bioRxiv. 2023 Mar 9:2023.03.08.531671. doi: 10.1101/2023.03.08.531671.
4
Inferring Identical-by-Descent Sharing of Sample Ancestors Promotes High-Resolution Relative Detection.
Am J Hum Genet. 2018 Jul 5;103(1):30-44. doi: 10.1016/j.ajhg.2018.05.008. Epub 2018 Jun 21.
5
Relationship estimation from whole-genome sequence data.
PLoS Genet. 2014 Jan 30;10(1):e1004144. doi: 10.1371/journal.pgen.1004144. eCollection 2014 Jan.
7
Identification of pedigree relationship from genome sharing.
G3 (Bethesda). 2013 Sep 4;3(9):1553-71. doi: 10.1534/g3.113.007500.
8
Cryptic distant relatives are common in both isolated and cosmopolitan genetic samples.
PLoS One. 2012;7(4):e34267. doi: 10.1371/journal.pone.0034267. Epub 2012 Apr 3.
9
[Genetic aspects of genealogy].
Genetika. 2011 Nov;47(11):1451-72.
10
A Genealogical Look at Shared Ancestry on the X Chromosome.
Genetics. 2016 Sep;204(1):57-75. doi: 10.1534/genetics.116.190041. Epub 2016 Jun 29.

本文引用的文献

1
Supporting the use of genetic genealogy in restoring family narratives following the transatlantic slave trade.
Am Anthropol. 2024 Mar;126(1):153-157. doi: 10.1111/aman.13939. Epub 2023 Oct 25.
2
Addressing the feasibility of people of African descent finding living African relatives using direct-to-consumer genetic testing.
Am J Biol Anthropol. 2023 Jun;181(2):163-165. doi: 10.1002/ajpa.24705. Epub 2023 Feb 2.
3
Within-sibship genome-wide association analyses decrease bias in estimates of direct genetic effects.
Nat Genet. 2022 May;54(5):581-592. doi: 10.1038/s41588-022-01062-7. Epub 2022 May 9.
4
Bonsai: An efficient method for inferring large human pedigrees from genotype data.
Am J Hum Genet. 2021 Nov 4;108(11):2052-2070. doi: 10.1016/j.ajhg.2021.09.013.
5
Crossover interference and sex-specific genetic maps shape identical by descent sharing in close relatives.
PLoS Genet. 2019 Dec 20;15(12):e1007979. doi: 10.1371/journal.pgen.1007979. eCollection 2019 Dec.
6
Inferring Identical-by-Descent Sharing of Sample Ancestors Promotes High-Resolution Relative Detection.
Am J Hum Genet. 2018 Jul 5;103(1):30-44. doi: 10.1016/j.ajhg.2018.05.008. Epub 2018 Jun 21.
7
Profiling and Leveraging Relatedness in a Precision Medicine Cohort of 92,455 Exomes.
Am J Hum Genet. 2018 May 3;102(5):874-889. doi: 10.1016/j.ajhg.2018.03.012.
8
Composite likelihood method for inferring local pedigrees.
PLoS Genet. 2017 Aug 21;13(8):e1006963. doi: 10.1371/journal.pgen.1006963. eCollection 2017 Aug.
9
PADRE: Pedigree-Aware Distant-Relationship Estimation.
Am J Hum Genet. 2016 Jul 7;99(1):154-62. doi: 10.1016/j.ajhg.2016.05.020. Epub 2016 Jun 30.
10
PRIMUS: rapid reconstruction of pedigrees from genome-wide estimates of identity by descent.
Am J Hum Genet. 2014 Nov 6;95(5):553-64. doi: 10.1016/j.ajhg.2014.10.005. Epub 2014 Oct 30.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验