一种基于状态一致性探索遗传相关性的对数比率双标图方法。

A Log-Ratio Biplot Approach for Exploring Genetic Relatedness Based on Identity by State.

作者信息

Graffelman Jan, Galván Femenía Iván, de Cid Rafael, Barceló Vidal Carles

机构信息

Department of Statistics and Operations Research, Technical University of Catalonia, Barcelona, Spain.

Department of Biostatistics, University of Washington, Seattle, WA, United States.

出版信息

Front Genet. 2019 Apr 24;10:341. doi: 10.3389/fgene.2019.00341. eCollection 2019.

DOI:10.3389/fgene.2019.00341

PMID:31068965

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6491861/

Abstract

The detection of cryptic relatedness in large population-based cohorts is of great importance in genome research. The usual approach for detecting closely related individuals is to plot allele sharing statistics, based on identity-by-state or identity-by-descent, in a two-dimensional scatterplot. This approach ignores that allele sharing data across individuals has in reality a higher dimensionality, and neither regards the compositional nature of the underlying counts of shared genotypes. In this paper we develop biplot methodology based on log-ratio principal component analysis that overcomes these restrictions. This leads to entirely new graphics that are essentially useful for exploring relatedness in genetic databases from homogeneous populations. The proposed method can be applied in an iterative manner, acting as a looking glass for more remote relationships that are harder to classify. Datasets from the 1,000 Genomes Project and the Genomes For Life-GCAT Project are used to illustrate the proposed method. The discriminatory power of the log-ratio biplot approach is compared with the classical plots in a simulation study. In a non-inbred homogeneous population the classification rate of the log-ratio principal component approach outperforms the classical graphics across the whole allele frequency spectrum, using only identity by state. In these circumstances, simulations show that with 35,000 independent bi-allelic variants, log-ratio principal component analysis, combined with discriminant analysis, can correctly classify relationships up to and including the fourth degree.

摘要

在基于人群的大型队列中检测隐秘的亲缘关系在基因组研究中具有重要意义。检测密切相关个体的常用方法是在二维散点图中绘制基于状态相同或血统相同的等位基因共享统计量。这种方法忽略了个体间等位基因共享数据实际上具有更高的维度，也没有考虑共享基因型潜在计数的构成性质。在本文中，我们基于对数比率主成分分析开发了双标图方法，克服了这些限制。这产生了全新的图形，对于探索来自同质人群的遗传数据库中的亲缘关系非常有用。所提出的方法可以以迭代方式应用，充当用于更难分类的更远亲缘关系的观察镜。使用来自千人基因组计划和生命基因组 - GCAT计划的数据集来说明所提出的方法。在模拟研究中，将对数比率双标图方法的判别能力与经典图进行了比较。在一个非近亲繁殖的同质人群中，仅使用状态相同，对数比率主成分方法的分类率在整个等位基因频率谱上优于经典图形。在这些情况下，模拟表明，对于35,000个独立的双等位基因变体，对数比率主成分分析与判别分析相结合，可以正确分类直至包括第四级的亲缘关系。

相似文献

A Log-Ratio Biplot Approach for Exploring Genetic Relatedness Based on Identity by State.一种基于状态一致性探索遗传相关性的对数比率双标图方法。

Front Genet. 2019 Apr 24;10:341. doi: 10.3389/fgene.2019.00341. eCollection 2019.

Mol Ecol Resour. 2017 Nov;17(6):1271-1282. doi: 10.1111/1755-0998.12674. Epub 2017 May 12.

A likelihood ratio approach for identifying three-quarter siblings in genetic databases.一种在基因数据库中识别三分之一同胞的似然比方法。

Heredity (Edinb). 2021 Mar;126(3):537-547. doi: 10.1038/s41437-020-00392-8. Epub 2021 Jan 15.

Exploration of geochemical data with compositional canonical biplots.利用成分典型双标图探索地球化学数据。

J Geochem Explor. 2018 Nov;194:120-133. doi: 10.1016/j.gexplo.2018.07.014. Epub 2018 Jul 25.

Fast and Accurate Shared Segment Detection and Relatedness Estimation in Un-phased Genetic Data via TRUFFLE.通过 TRUFFLE 在非相位遗传数据中快速准确地检测共享片段和估计亲缘关系。

Am J Hum Genet. 2019 Jul 3;105(1):78-88. doi: 10.1016/j.ajhg.2019.05.007. Epub 2019 Jun 6.

Non-identifiability of identity coefficients at biallelic loci.双等位基因位点上身份系数的不可识别性。

Theor Popul Biol. 2014 Mar;92:22-9. doi: 10.1016/j.tpb.2013.11.001. Epub 2013 Nov 21.

How Well Do Molecular and Pedigree Relatedness Correspond, in Populations with Diverse Mating Systems, and Various Types and Quantities of Molecular and Demographic Data?在具有不同交配系统以及各种类型和数量的分子与人口统计学数据的人群中，分子关联性与谱系关联性的对应程度如何？

G3 (Bethesda). 2015 Jun 30;5(9):1815-26. doi: 10.1534/g3.115.019323.

A new method for correlation analysis of compositional (environmental) data - a worked example.一种新的成分（环境）数据分析相关方法——实例研究

Sci Total Environ. 2017 Dec 31;607-608:965-971. doi: 10.1016/j.scitotenv.2017.06.063. Epub 2017 Jul 27.

Consanguinity and the sib-pair method: an approach using identity by descent between and within individuals.近亲结婚与同胞对法：一种利用个体间和个体内同源性的方法。

Am J Hum Genet. 1996 Nov;59(5):1149-62.

Improved enrichment factor calculations through principal component analysis: Examples from soils near breccia pipe uranium mines, Arizona, USA.通过主成分分析提高富集因子计算：来自美国亚利桑那州角砾岩管铀矿区附近土壤的实例。

Environ Pollut. 2019 May;248:90-100. doi: 10.1016/j.envpol.2019.01.122. Epub 2019 Feb 7.

引用本文的文献

Immune Cell Landscape Identification Associates Intrarenal Mononuclear Phagocytes With Onset and Remission of Lupus Nephritis in NZB/W Mice.免疫细胞图谱鉴定将肾内单核吞噬细胞与NZB/W小鼠狼疮性肾炎的发病和缓解相关联。

Front Genet. 2020 Nov 9;11:577040. doi: 10.3389/fgene.2020.577040. eCollection 2020.

本文引用的文献

The UK Biobank resource with deep phenotyping and genomic data.英国生物银行资源库，具有深度表型和基因组数据。

Nature. 2018 Oct;562(7726):203-209. doi: 10.1038/s41586-018-0579-z. Epub 2018 Oct 10.

Multitrait genome association analysis identifies new susceptibility genes for human anthropometric variation in the GCAT cohort.多性状全基因组关联分析鉴定 GCAT 队列中人类人体测量变异的新易感基因。

J Med Genet. 2018 Nov;55(11):765-778. doi: 10.1136/jmedgenet-2018-105437. Epub 2018 Aug 30.

Linkage Disequilibrium and Evaluation of Genome-Wide Association Mapping Models in Tetraploid Potato.四倍体马铃薯的连锁不平衡及全基因组关联作图模型评估

G3 (Bethesda). 2018 Oct 3;8(10):3185-3202. doi: 10.1534/g3.118.200377.

GCAT|Genomes for life: a prospective cohort study of the genomes of Catalonia.GCAT|生命基因组：加泰罗尼亚基因组的前瞻性队列研究。

BMJ Open. 2018 Mar 27;8(3):e018324. doi: 10.1136/bmjopen-2017-018324.

Effects of sampling close relatives on some elementary population genetics analyses.采样近亲对某些基础种群遗传学分析的影响。

Mol Ecol Resour. 2018 Jan;18(1):41-54. doi: 10.1111/1755-0998.12708. Epub 2017 Sep 18.

Genetics. 2017 Sep;207(1):75-82. doi: 10.1534/genetics.117.1122. Epub 2017 Jul 24.

Genome-Wide Association Mapping Reveals Multiple QTLs Governing Tolerance Response for Seedling Stage Chilling Stress in Rice.全基因组关联图谱揭示了多个控制水稻苗期低温胁迫耐受性反应的数量性状位点。

Front Plant Sci. 2017 Apr 25;8:552. doi: 10.3389/fpls.2017.00552. eCollection 2017.

Mol Ecol Resour. 2017 Nov;17(6):1271-1282. doi: 10.1111/1755-0998.12674. Epub 2017 May 12.

Population Structure, Diversity and Trait Association Analysis in Rice (Oryza sativa L.) Germplasm for Early Seedling Vigor (ESV) Using Trait Linked SSR Markers.利用与性状连锁的SSR标记对水稻（Oryza sativa L.）种质资源的群体结构、多样性及早期幼苗活力（ESV）性状关联分析

PLoS One. 2016 Mar 31;11(3):e0152406. doi: 10.1371/journal.pone.0152406. eCollection 2016.

A global reference for human genetic variation.人类遗传变异的全球参考。

Nature. 2015 Oct 1;526(7571):68-74. doi: 10.1038/nature15393.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种基于状态一致性探索遗传相关性的对数比率双标图方法。

A Log-Ratio Biplot Approach for Exploring Genetic Relatedness Based on Identity by State.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献