Suppr超能文献

使用分层软聚类方法tangleGen推断血统。

Inferring ancestry with the hierarchical soft clustering approach tangleGen.

作者信息

Burger Klara Elisabeth, Klepper Solveig, von Luxburg Ulrike, Baumdicker Franz

机构信息

Department of Computer Science, University of Tübingen, 72074 Tübingen, Germany.

Tübingen AI Center, 72076 Tübingen, Germany.

出版信息

Genome Res. 2024 Dec 23;34(12):2244-2255. doi: 10.1101/gr.279399.124.

Abstract

Understanding the genetic ancestry of populations is central to numerous scientific and societal fields. It contributes to a better understanding of human evolutionary history, advances personalized medicine, aids in forensic identification, and allows individuals to connect to their genealogical roots. Existing methods, such as ADMIXTURE, have significantly improved our ability to infer ancestries. However, these methods typically work with a fixed number of independent ancestral populations. As a result, they provide insight into genetic admixture, but do not include a hierarchical interpretation. In particular, the intricate ancestral population structures remain difficult to unravel. Alternative methods with a consistent inheritance structure, such as hierarchical clustering, may offer benefits in terms of interpreting the inferred ancestries. Here, we present tangleGen, a soft clustering tool that transfers the hierarchical machine learning framework Tangles, which leverages graph theoretical concepts, to the field of population genetics. The hierarchical perspective of tangleGen on the composition and structure of populations improves the interpretability of the inferred ancestral relationships. Moreover, tangleGen adds a new layer of explainability, as it allows identifying the single-nucleotide polymorphisms that are responsible for the clustering structure. We demonstrate the capabilities and benefits of tangleGen for the inference of ancestral relationships, using both simulated data and data from the 1000 Genomes Project.

摘要

了解人群的遗传谱系是众多科学和社会领域的核心。它有助于更好地理解人类进化历史,推动个性化医疗,辅助法医鉴定,并让个人能够追溯其族谱根源。现有的方法,如ADMIXTURE,显著提高了我们推断谱系的能力。然而,这些方法通常适用于固定数量的独立祖先群体。因此,它们能洞察遗传混合情况,但不包括分层解释。特别是,复杂的祖先群体结构仍然难以厘清。具有一致遗传结构的替代方法,如层次聚类,在解释推断出的谱系方面可能具有优势。在此,我们展示了tangleGen,这是一种软聚类工具,它将利用图论概念的层次机器学习框架Tangles应用于群体遗传学领域。tangleGen对群体组成和结构的层次视角提高了推断出的祖先关系的可解释性。此外,tangleGen增加了一层新的可解释性,因为它允许识别导致聚类结构的单核苷酸多态性。我们使用模拟数据和千人基因组计划的数据,展示了tangleGen在推断祖先关系方面的能力和优势。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/209d/11694745/18f82c7879a0/2244f01.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验