• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

STICI:用于基因型填充的集成卷积拆分变压器

STICI: Split-Transformer with integrated convolutions for genotype imputation.

作者信息

Mowlaei Mohammad Erfan, Li Chong, Jamialahmadi Oveis, Dias Raquel, Chen Junjie, Jamialahmadi Benyamin, Rebbeck Timothy Richard, Carnevale Vincenzo, Kumar Sudhir, Shi Xinghua

机构信息

Computer & Information Sciences, College of Science and Technology, Temple University, Philadelphia, PA, USA.

Department of Molecular and Clinical Medicine, Institute of Medicine, Sahlgrenska Academy, Wallenberg Laboratory, University of Gothenburg, Gothenburg, Sweden.

出版信息

Nat Commun. 2025 Jan 31;16(1):1218. doi: 10.1038/s41467-025-56273-3.

DOI:10.1038/s41467-025-56273-3
PMID:39890780
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11785734/
Abstract

Despite advances in sequencing technologies, genome-scale datasets often contain missing bases and genomic segments, hindering downstream analyses. Genotype imputation addresses this issue and has been a cornerstone pre-processing step in genetic and genomic studies. Although various methods have been widely adopted for genotype imputation, it remains challenging to impute certain genomic regions and large structural variants. Here, we present a transformer-based framework, named STICI, for accurate genotype imputation. STICI models automatically learn genome-wide patterns of linkage disequilibrium, evidenced by much higher imputation accuracy in regions with highly linked variants. Our imputation results on the human 1000 Genomes Project and non-human genomes show that STICI can achieve high imputation accuracy comparable to the state-of-the-art genotype imputation methods, with the additional capability to impute multi-allelic variants and various types of genetic variants. STICI can be trained for any collection of genomes automatically using self-supervision. Moreover, STICI shows excellent performance without needing any special presuppositions about the underlying patterns in collections of non-human genomes, pointing to adaptability and applications of STICI to impute missing genotypes in any species.

摘要

尽管测序技术取得了进展,但基因组规模的数据集通常包含缺失碱基和基因组片段,这阻碍了下游分析。基因型填充解决了这个问题,并且一直是遗传和基因组研究中的一个基础预处理步骤。尽管各种方法已被广泛用于基因型填充,但对某些基因组区域和大型结构变异进行填充仍然具有挑战性。在这里,我们提出了一个基于Transformer的框架,名为STICI,用于准确的基因型填充。STICI模型自动学习全基因组范围的连锁不平衡模式,这在具有高度连锁变异的区域中具有更高的填充准确性得到了证明。我们在人类千人基因组计划和非人类基因组上的填充结果表明,STICI可以实现与最先进的基因型填充方法相当的高填充准确性,并且还具有填充多等位基因变异和各种类型遗传变异的额外能力。STICI可以使用自我监督自动针对任何基因组集合进行训练。此外,STICI在不需要对非人类基因组集合中的潜在模式有任何特殊预设的情况下表现出色,这表明STICI在任何物种中填充缺失基因型的适应性和应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/82d6/11785734/4be874d81fa5/41467_2025_56273_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/82d6/11785734/3954a7874ce0/41467_2025_56273_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/82d6/11785734/434a4242051a/41467_2025_56273_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/82d6/11785734/e046fd75c196/41467_2025_56273_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/82d6/11785734/98d387416dba/41467_2025_56273_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/82d6/11785734/393a73649247/41467_2025_56273_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/82d6/11785734/4be874d81fa5/41467_2025_56273_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/82d6/11785734/3954a7874ce0/41467_2025_56273_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/82d6/11785734/434a4242051a/41467_2025_56273_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/82d6/11785734/e046fd75c196/41467_2025_56273_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/82d6/11785734/98d387416dba/41467_2025_56273_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/82d6/11785734/393a73649247/41467_2025_56273_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/82d6/11785734/4be874d81fa5/41467_2025_56273_Fig6_HTML.jpg

相似文献

1
STICI: Split-Transformer with integrated convolutions for genotype imputation.STICI:用于基因型填充的集成卷积拆分变压器
Nat Commun. 2025 Jan 31;16(1):1218. doi: 10.1038/s41467-025-56273-3.
2
A comprehensive evaluation of SNP genotype imputation.单核苷酸多态性(SNP)基因型填充的综合评估。
Hum Genet. 2009 Mar;125(2):163-71. doi: 10.1007/s00439-008-0606-5. Epub 2008 Dec 17.
3
Evaluating Imputation Algorithms for Low-Depth Genotyping-By-Sequencing (GBS) Data.评估低深度简化基因组测序(GBS)数据的插补算法
PLoS One. 2016 Aug 18;11(8):e0160733. doi: 10.1371/journal.pone.0160733. eCollection 2016.
4
Genotype imputation methods for whole and complex genomic regions utilizing deep learning technology.利用深度学习技术对全基因组和复杂基因组区域进行基因型推断的方法。
J Hum Genet. 2024 Oct;69(10):481-486. doi: 10.1038/s10038-023-01213-6. Epub 2024 Jan 15.
5
Molgenis-impute: imputation pipeline in a box.Molgenis-impute:一体化的插补流程。
BMC Res Notes. 2015 Aug 19;8:359. doi: 10.1186/s13104-015-1309-3.
6
Two-stage strategy using denoising autoencoders for robust reference-free genotype imputation with missing input genotypes.两阶段策略使用去噪自动编码器实现稳健的无参考基因型缺失输入基因型的基因型推断。
J Hum Genet. 2024 Oct;69(10):511-518. doi: 10.1038/s10038-024-01261-6. Epub 2024 Jun 25.
7
Genotype Imputation in Genome-Wide Association Studies.全基因组关联研究中的基因型填充
Curr Protoc Hum Genet. 2019 Jun;102(1):e84. doi: 10.1002/cphg.84.
8
Accuracy of genome-wide imputation of untyped markers and impacts on statistical power for association studies.未分型标记的全基因组推断准确性及其对关联研究统计效能的影响。
BMC Genet. 2009 Jun 16;10:27. doi: 10.1186/1471-2156-10-27.
9
MagicalRsq-X: A cross-cohort transferable genotype imputation quality metric.MagicalRsq-X:一种跨队列可转移的基因型填充质量指标。
Am J Hum Genet. 2024 May 2;111(5):990-995. doi: 10.1016/j.ajhg.2024.04.001. Epub 2024 Apr 17.
10
Evaluating genotype imputation pipeline for ultra-low coverage ancient genomes.评估超低覆盖度古基因组的基因型推断流程。
Sci Rep. 2020 Oct 29;10(1):18542. doi: 10.1038/s41598-020-75387-w.

引用本文的文献

1
BiU-Net: A Biologically Informed U-Net for Genotype Imputation.BiU-Net:一种用于基因型插补的基于生物学信息的U-Net
Res Sq. 2025 Aug 26:rs.3.rs-6797863. doi: 10.21203/rs.3.rs-6797863/v1.
2
GENA-LM: a family of open-source foundational DNA language models for long sequences.GENA-LM:用于长序列的开源基础DNA语言模型家族。
Nucleic Acids Res. 2025 Jan 11;53(2). doi: 10.1093/nar/gkae1310.

本文引用的文献

1
Efficient HLA imputation from sequential SNPs data by transformer.基于 Transformer 的基于序贯 SNP 数据的高效 HLA 推测
J Hum Genet. 2024 Oct;69(10):533-540. doi: 10.1038/s10038-024-01278-x. Epub 2024 Aug 2.
2
Two-stage strategy using denoising autoencoders for robust reference-free genotype imputation with missing input genotypes.两阶段策略使用去噪自动编码器实现稳健的无参考基因型缺失输入基因型的基因型推断。
J Hum Genet. 2024 Oct;69(10):511-518. doi: 10.1038/s10038-024-01261-6. Epub 2024 Jun 25.
3
A rapid and reference-free imputation method for low-cost genotyping platforms.
一种快速且无需参考的低成本基因分型平台插补方法。
Sci Rep. 2023 Dec 27;13(1):23083. doi: 10.1038/s41598-023-50086-4.
4
Imputation of low-coverage sequencing data from 150,119 UK Biobank genomes.从 150119 个英国生物样本库基因组中推断低覆盖率测序数据。
Nat Genet. 2023 Jul;55(7):1088-1090. doi: 10.1038/s41588-023-01438-3. Epub 2023 Jun 29.
5
Accurate rare variant phasing of whole-genome and whole-exome sequencing data in the UK Biobank.在英国生物样本库中对全基因组和外显子组测序数据进行准确的罕见变异相位分析。
Nat Genet. 2023 Jul;55(7):1243-1249. doi: 10.1038/s41588-023-01415-w. Epub 2023 Jun 29.
6
Evolutionary-scale prediction of atomic-level protein structure with a language model.用语言模型进行原子级蛋白质结构的进化尺度预测。
Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.
7
An autoencoder-based deep learning method for genotype imputation.一种基于自动编码器的深度学习基因分型填充方法。
Front Artif Intell. 2022 Nov 3;5:1028978. doi: 10.3389/frai.2022.1028978. eCollection 2022.
8
Rapid, Reference-Free human genotype imputation with denoising autoencoders.基于去噪自动编码器的快速、无需参考的人类基因型推断。
Elife. 2022 Sep 23;11:e75600. doi: 10.7554/eLife.75600.
9
Towards accurate and reliable resolution of structural variants for clinical diagnosis.致力于实现结构变异的准确可靠解析,以用于临床诊断。
Genome Biol. 2022 Mar 3;23(1):68. doi: 10.1186/s13059-022-02636-8.
10
Genome-Wide Association Study on Three Behaviors Tested in an Open Field in Heterogeneous Stock Rats Identifies Multiple Loci Implicated in Psychiatric Disorders.对异质种群大鼠在旷场中测试的三种行为进行全基因组关联研究,发现了与精神疾病相关的多个基因座。
Front Psychiatry. 2022 Feb 14;13:790566. doi: 10.3389/fpsyt.2022.790566. eCollection 2022.