Suppr超能文献

SALAI-Net:无物种特异性的局部亲缘关系推断网络。

SALAI-Net: species-agnostic local ancestry inference network.

机构信息

Department of Signal Theory and Communications, Universitat Politecnica de Catalunya, Barcelona 08034, Spain.

Department of Biomedical Data Science, Stanford Medical School.

出版信息

Bioinformatics. 2022 Sep 16;38(Suppl_2):ii27-ii33. doi: 10.1093/bioinformatics/btac464.

Abstract

MOTIVATION

Local ancestry inference (LAI) is the high resolution prediction of ancestry labels along a DNA sequence. LAI is important in the study of human history and migrations, and it is beginning to play a role in precision medicine applications including ancestry-adjusted genome-wide association studies (GWASs) and polygenic risk scores (PRSs). Existing LAI models do not generalize well between species, chromosomes or even ancestry groups, requiring re-training for each different setting. Furthermore, such methods can lack interpretability, which is an important element in each of these applications.

RESULTS

We present SALAI-Net, a portable statistical LAI method that can be applied on any set of species and ancestries (species-agnostic), requiring only haplotype data and no other biological parameters. Inspired by identity by descent methods, SALAI-Net estimates population labels for each segment of DNA by performing a reference matching approach, which leads to an interpretable and fast technique. We benchmark our models on whole-genome data of humans and we test these models' ability to generalize to dog breeds when trained on human data. SALAI-Net outperforms previous methods in terms of balanced accuracy, while generalizing between different settings, species and datasets. Moreover, it is up to two orders of magnitude faster and uses considerably less RAM memory than competing methods.

AVAILABILITY AND IMPLEMENTATION

We provide an open source implementation and links to publicly available data at github.com/AI-sandbox/SALAI-Net. Data is publicly available as follows: https://www.internationalgenome.org (1000 Genomes), https://www.simonsfoundation.org/simons-genome-diversity-project (Simons Genome Diversity Project), https://www.sanger.ac.uk/resources/downloads/human/hapmap3.html (HapMap), ftp://ngs.sanger.ac.uk/production/hgdp/hgdp_wgs.20190516 (Human Genome Diversity Project) and https://www.ncbi.nlm.nih.gov/bioproject/PRJNA448733 (Canid genomes).

SUPPLEMENTARY INFORMATION

Supplementary data are available from Bioinformatics online.

摘要

动机

局部祖源推断(LAI)是对 DNA 序列中祖先标签的高分辨率预测。LAI 在人类历史和迁徙研究中很重要,并且它开始在精准医学应用中发挥作用,包括基于祖先的全基因组关联研究(GWAS)和多基因风险评分(PRSs)。现有的 LAI 模型在物种、染色体甚至祖源群体之间不能很好地推广,需要针对每个不同的设置进行重新训练。此外,此类方法可能缺乏可解释性,而可解释性是这些应用中的一个重要元素。

结果

我们提出了 SALAI-Net,这是一种可应用于任何物种和祖源(与物种无关)的便携式统计 LAI 方法,仅需要单倍型数据,而不需要其他生物学参数。受同源法的启发,SALAI-Net 通过执行参考匹配方法来估计 DNA 片段的群体标签,从而产生一种可解释且快速的技术。我们在人类全基因组数据上对我们的模型进行了基准测试,并测试了这些模型在人类数据上训练时对犬种的泛化能力。在平衡准确性方面,SALAI-Net 优于以前的方法,同时在不同的设置、物种和数据集之间进行了推广。此外,它的速度快了两个数量级,并且使用的 RAM 内存比竞争方法少了几个数量级。

可用性和实现

我们在 github.com/AI-sandbox/SALAI-Net 上提供了一个开源实现和指向公共可用数据的链接。数据可从以下网址获得:https://www.internationalgenome.org(1000 基因组)、https://www.simonsfoundation.org/simons-genome-diversity-project(西蒙斯基因组多样性项目)、https://www.sanger.ac.uk/resources/downloads/human/hapmap3.html(人类基因组多样性计划)、ftp://ngs.sanger.ac.uk/production/hgdp/hgdp_wgs.20190516(人类基因组多样性计划)和 https://www.ncbi.nlm.nih.gov/bioproject/PRJNA448733(犬科基因组)。

补充信息

补充数据可从 Bioinformatics 在线获得。

相似文献

1
SALAI-Net: species-agnostic local ancestry inference network.SALAI-Net:无物种特异性的局部亲缘关系推断网络。
Bioinformatics. 2022 Sep 16;38(Suppl_2):ii27-ii33. doi: 10.1093/bioinformatics/btac464.
3
Fast and compact matching statistics analytics.快速且紧凑的匹配统计分析。
Bioinformatics. 2022 Mar 28;38(7):1838-1845. doi: 10.1093/bioinformatics/btac064.
7
Improved ancestry inference using weights from external reference panels.利用外部参考面板的权重提高祖先推断。
Bioinformatics. 2013 Jun 1;29(11):1399-406. doi: 10.1093/bioinformatics/btt144. Epub 2013 Mar 28.

引用本文的文献

1
Recomb-Mix: fast and accurate local ancestry inference.Recomb-Mix:快速准确的局部祖先推断
Bioinformatics. 2025 Jul 1;41(Supplement_1):i180-i188. doi: 10.1093/bioinformatics/btaf227.
4
Global and Local Ancestry and its Importance: A Review.全球和本地血统及其重要性:综述
Curr Genomics. 2024;25(4):237-260. doi: 10.2174/0113892029298909240426094055. Epub 2024 May 9.
9
Neural ADMIXTURE for rapid genomic clustering.用于快速基因组聚类的神经混合模型
Nat Comput Sci. 2023 Jul;3(7):621-629. doi: 10.1038/s43588-023-00482-7. Epub 2023 Jul 6.

本文引用的文献

1
Generative Moment Matching Networks for Genotype Simulation.生成式时刻匹配网络在基因型模拟中的应用。
Annu Int Conf IEEE Eng Med Biol Soc. 2022 Jul;2022:1379-1383. doi: 10.1109/EMBC48229.2022.9871045.
2
Archetypal Analysis for population genetics.群体遗传学的原型分析。
PLoS Comput Biol. 2022 Aug 25;18(8):e1010301. doi: 10.1371/journal.pcbi.1010301. eCollection 2022 Aug.
10
Screening Human Embryos for Polygenic Traits Has Limited Utility.筛查多基因性状的人类胚胎实用性有限。
Cell. 2019 Nov 27;179(6):1424-1435.e8. doi: 10.1016/j.cell.2019.10.033. Epub 2019 Nov 21.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验