• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

SALAI-Net:无物种特异性的局部亲缘关系推断网络。

SALAI-Net: species-agnostic local ancestry inference network.

机构信息

Department of Signal Theory and Communications, Universitat Politecnica de Catalunya, Barcelona 08034, Spain.

Department of Biomedical Data Science, Stanford Medical School.

出版信息

Bioinformatics. 2022 Sep 16;38(Suppl_2):ii27-ii33. doi: 10.1093/bioinformatics/btac464.

DOI:10.1093/bioinformatics/btac464
PMID:36124792
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9486591/
Abstract

MOTIVATION

Local ancestry inference (LAI) is the high resolution prediction of ancestry labels along a DNA sequence. LAI is important in the study of human history and migrations, and it is beginning to play a role in precision medicine applications including ancestry-adjusted genome-wide association studies (GWASs) and polygenic risk scores (PRSs). Existing LAI models do not generalize well between species, chromosomes or even ancestry groups, requiring re-training for each different setting. Furthermore, such methods can lack interpretability, which is an important element in each of these applications.

RESULTS

We present SALAI-Net, a portable statistical LAI method that can be applied on any set of species and ancestries (species-agnostic), requiring only haplotype data and no other biological parameters. Inspired by identity by descent methods, SALAI-Net estimates population labels for each segment of DNA by performing a reference matching approach, which leads to an interpretable and fast technique. We benchmark our models on whole-genome data of humans and we test these models' ability to generalize to dog breeds when trained on human data. SALAI-Net outperforms previous methods in terms of balanced accuracy, while generalizing between different settings, species and datasets. Moreover, it is up to two orders of magnitude faster and uses considerably less RAM memory than competing methods.

AVAILABILITY AND IMPLEMENTATION

We provide an open source implementation and links to publicly available data at github.com/AI-sandbox/SALAI-Net. Data is publicly available as follows: https://www.internationalgenome.org (1000 Genomes), https://www.simonsfoundation.org/simons-genome-diversity-project (Simons Genome Diversity Project), https://www.sanger.ac.uk/resources/downloads/human/hapmap3.html (HapMap), ftp://ngs.sanger.ac.uk/production/hgdp/hgdp_wgs.20190516 (Human Genome Diversity Project) and https://www.ncbi.nlm.nih.gov/bioproject/PRJNA448733 (Canid genomes).

SUPPLEMENTARY INFORMATION

Supplementary data are available from Bioinformatics online.

摘要

动机

局部祖源推断(LAI)是对 DNA 序列中祖先标签的高分辨率预测。LAI 在人类历史和迁徙研究中很重要,并且它开始在精准医学应用中发挥作用,包括基于祖先的全基因组关联研究(GWAS)和多基因风险评分(PRSs)。现有的 LAI 模型在物种、染色体甚至祖源群体之间不能很好地推广,需要针对每个不同的设置进行重新训练。此外,此类方法可能缺乏可解释性,而可解释性是这些应用中的一个重要元素。

结果

我们提出了 SALAI-Net,这是一种可应用于任何物种和祖源(与物种无关)的便携式统计 LAI 方法,仅需要单倍型数据,而不需要其他生物学参数。受同源法的启发,SALAI-Net 通过执行参考匹配方法来估计 DNA 片段的群体标签,从而产生一种可解释且快速的技术。我们在人类全基因组数据上对我们的模型进行了基准测试,并测试了这些模型在人类数据上训练时对犬种的泛化能力。在平衡准确性方面,SALAI-Net 优于以前的方法,同时在不同的设置、物种和数据集之间进行了推广。此外,它的速度快了两个数量级,并且使用的 RAM 内存比竞争方法少了几个数量级。

可用性和实现

我们在 github.com/AI-sandbox/SALAI-Net 上提供了一个开源实现和指向公共可用数据的链接。数据可从以下网址获得:https://www.internationalgenome.org(1000 基因组)、https://www.simonsfoundation.org/simons-genome-diversity-project(西蒙斯基因组多样性项目)、https://www.sanger.ac.uk/resources/downloads/human/hapmap3.html(人类基因组多样性计划)、ftp://ngs.sanger.ac.uk/production/hgdp/hgdp_wgs.20190516(人类基因组多样性计划)和 https://www.ncbi.nlm.nih.gov/bioproject/PRJNA448733(犬科基因组)。

补充信息

补充数据可从 Bioinformatics 在线获得。

相似文献

1
SALAI-Net: species-agnostic local ancestry inference network.SALAI-Net:无物种特异性的局部亲缘关系推断网络。
Bioinformatics. 2022 Sep 16;38(Suppl_2):ii27-ii33. doi: 10.1093/bioinformatics/btac464.
2
Genetic, epigenetic and genomic effects on variation of gene expression among grape varieties.遗传、表观遗传和基因组对不同葡萄品种间基因表达变异性的影响。
Plant J. 2019 Sep;99(5):895-909. doi: 10.1111/tpj.14370. Epub 2019 Jun 7.
3
Fast and compact matching statistics analytics.快速且紧凑的匹配统计分析。
Bioinformatics. 2022 Mar 28;38(7):1838-1845. doi: 10.1093/bioinformatics/btac064.
4
Haptools: a toolkit for admixture and haplotype analysis.Haptools:混合分析和单倍型分析工具包。
Bioinformatics. 2023 Mar 1;39(3). doi: 10.1093/bioinformatics/btad104.
5
hapCon: estimating contamination of ancient genomes by copying from reference haplotypes.hapCon:通过从参考单倍型复制来估计古代基因组的污染。
Bioinformatics. 2022 Aug 2;38(15):3768-3777. doi: 10.1093/bioinformatics/btac390.
6
Plasticity in Triticeae centromere DNA sequences: a wheat × tall wheatgrass (decaploid) model.小麦族着丝粒 DNA 序列的可塑性:小麦 × 长穗偃麦草(十倍体)模型。
Plant J. 2019 Oct;100(2):314-327. doi: 10.1111/tpj.14444. Epub 2019 Sep 9.
7
Improved ancestry inference using weights from external reference panels.利用外部参考面板的权重提高祖先推断。
Bioinformatics. 2013 Jun 1;29(11):1399-406. doi: 10.1093/bioinformatics/btt144. Epub 2013 Mar 28.
8
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
9
EagleImp: fast and accurate genome-wide phasing and imputation in a single tool.EagleImp:在单个工具中实现快速准确的全基因组定相和基因分型。
Bioinformatics. 2022 Nov 15;38(22):4999-5006. doi: 10.1093/bioinformatics/btac637.
10
Ancestry inference using reference labeled clusters of haplotypes.基于参考标签的单倍型聚类进行祖籍推断。
BMC Bioinformatics. 2021 Sep 25;22(1):459. doi: 10.1186/s12859-021-04350-x.

引用本文的文献

1
Recomb-Mix: fast and accurate local ancestry inference.Recomb-Mix:快速准确的局部祖先推断
Bioinformatics. 2025 Jul 1;41(Supplement_1):i180-i188. doi: 10.1093/bioinformatics/btaf227.
2
Opportunities and challenges of local ancestry in genetic association analyses.遗传关联分析中本地祖先的机遇与挑战。
Am J Hum Genet. 2025 Apr 3;112(4):727-740. doi: 10.1016/j.ajhg.2025.03.004.
3
Old vs. New Local Ancestry Inference in HCHS/SOL: A Comparative Study.西班牙裔社区健康研究/拉丁裔研究中旧版与新版本地血统推断的比较研究
bioRxiv. 2025 Feb 8:2025.02.04.636481. doi: 10.1101/2025.02.04.636481.
4
Global and Local Ancestry and its Importance: A Review.全球和本地血统及其重要性:综述
Curr Genomics. 2024;25(4):237-260. doi: 10.2174/0113892029298909240426094055. Epub 2024 May 9.
5
Estimation of spatial demographic maps from polymorphism data using a neural network.使用神经网络从多态性数据估计空间人口统计学地图。
Mol Ecol Resour. 2024 Oct;24(7):e14005. doi: 10.1111/1755-0998.14005. Epub 2024 Aug 16.
6
Machine Learning Strategies for Improved Phenotype Prediction in Underrepresented Populations.机器学习策略在代表性不足人群中的表型预测改善。
Pac Symp Biocomput. 2024;29:404-418.
7
Fast and accurate local ancestry inference with Recomb-Mix.使用Recomb-Mix进行快速准确的本地祖先推断。
bioRxiv. 2024 Sep 25:2023.11.17.567650. doi: 10.1101/2023.11.17.567650.
8
Machine Learning Strategies for Improved Phenotype Prediction in Underrepresented Populations.改善代表性不足人群表型预测的机器学习策略
bioRxiv. 2023 Oct 17:2023.10.12.561949. doi: 10.1101/2023.10.12.561949.
9
Neural ADMIXTURE for rapid genomic clustering.用于快速基因组聚类的神经混合模型
Nat Comput Sci. 2023 Jul;3(7):621-629. doi: 10.1038/s43588-023-00482-7. Epub 2023 Jul 6.

本文引用的文献

1
Generative Moment Matching Networks for Genotype Simulation.生成式时刻匹配网络在基因型模拟中的应用。
Annu Int Conf IEEE Eng Med Biol Soc. 2022 Jul;2022:1379-1383. doi: 10.1109/EMBC48229.2022.9871045.
2
Archetypal Analysis for population genetics.群体遗传学的原型分析。
PLoS Comput Biol. 2022 Aug 25;18(8):e1010301. doi: 10.1371/journal.pcbi.1010301. eCollection 2022 Aug.
3
Paths and timings of the peopling of Polynesia inferred from genomic networks.基于基因组网络推断的波利尼西亚人群的迁徙路径和时间。
Nature. 2021 Sep;597(7877):522-526. doi: 10.1038/s41586-021-03902-8. Epub 2021 Sep 22.
4
Genetic Ancestry Inference and Its Application for the Genetic Mapping of Human Diseases.遗传血统推断及其在人类疾病遗传图谱绘制中的应用。
Int J Mol Sci. 2021 Jun 28;22(13):6962. doi: 10.3390/ijms22136962.
5
Tractor uses local ancestry to enable the inclusion of admixed individuals in GWAS and to boost power.拖拉机使用本地血统来实现混合个体在 GWAS 中的纳入,并提高了研究的效力。
Nat Genet. 2021 Feb;53(2):195-204. doi: 10.1038/s41588-020-00766-y. Epub 2021 Jan 18.
6
Local ancestry inference provides insight into Tilapia breeding programmes.本地血统推断为罗非鱼的养殖计划提供了深入了解。
Sci Rep. 2020 Oct 29;10(1):18613. doi: 10.1038/s41598-020-75744-9.
7
Native American gene flow into Polynesia predating Easter Island settlement.美拉尼西亚族群基因流在波利尼西亚人定居复活节岛之前。
Nature. 2020 Jul;583(7817):572-577. doi: 10.1038/s41586-020-2487-2. Epub 2020 Jul 8.
8
Ancestry deconvolution and partial polygenic score can improve susceptibility predictions in recently admixed individuals.祖源去卷积和部分多基因评分可提高近期混合人群的易感性预测。
Nat Commun. 2020 Apr 2;11(1):1628. doi: 10.1038/s41467-020-15464-w.
9
Insights into human genetic variation and population history from 929 diverse genomes.从 929 个不同的基因组中深入了解人类遗传变异和人口历史。
Science. 2020 Mar 20;367(6484). doi: 10.1126/science.aay5012.
10
Screening Human Embryos for Polygenic Traits Has Limited Utility.筛查多基因性状的人类胚胎实用性有限。
Cell. 2019 Nov 27;179(6):1424-1435.e8. doi: 10.1016/j.cell.2019.10.033. Epub 2019 Nov 21.