Suppr超能文献

STICI:用于基因型填充的集成卷积拆分变压器

STICI: Split-Transformer with integrated convolutions for genotype imputation.

作者信息

Mowlaei Mohammad Erfan, Li Chong, Jamialahmadi Oveis, Dias Raquel, Chen Junjie, Jamialahmadi Benyamin, Rebbeck Timothy Richard, Carnevale Vincenzo, Kumar Sudhir, Shi Xinghua

机构信息

Computer & Information Sciences, College of Science and Technology, Temple University, Philadelphia, PA, USA.

Department of Molecular and Clinical Medicine, Institute of Medicine, Sahlgrenska Academy, Wallenberg Laboratory, University of Gothenburg, Gothenburg, Sweden.

出版信息

Nat Commun. 2025 Jan 31;16(1):1218. doi: 10.1038/s41467-025-56273-3.

Abstract

Despite advances in sequencing technologies, genome-scale datasets often contain missing bases and genomic segments, hindering downstream analyses. Genotype imputation addresses this issue and has been a cornerstone pre-processing step in genetic and genomic studies. Although various methods have been widely adopted for genotype imputation, it remains challenging to impute certain genomic regions and large structural variants. Here, we present a transformer-based framework, named STICI, for accurate genotype imputation. STICI models automatically learn genome-wide patterns of linkage disequilibrium, evidenced by much higher imputation accuracy in regions with highly linked variants. Our imputation results on the human 1000 Genomes Project and non-human genomes show that STICI can achieve high imputation accuracy comparable to the state-of-the-art genotype imputation methods, with the additional capability to impute multi-allelic variants and various types of genetic variants. STICI can be trained for any collection of genomes automatically using self-supervision. Moreover, STICI shows excellent performance without needing any special presuppositions about the underlying patterns in collections of non-human genomes, pointing to adaptability and applications of STICI to impute missing genotypes in any species.

摘要

尽管测序技术取得了进展,但基因组规模的数据集通常包含缺失碱基和基因组片段,这阻碍了下游分析。基因型填充解决了这个问题,并且一直是遗传和基因组研究中的一个基础预处理步骤。尽管各种方法已被广泛用于基因型填充,但对某些基因组区域和大型结构变异进行填充仍然具有挑战性。在这里,我们提出了一个基于Transformer的框架,名为STICI,用于准确的基因型填充。STICI模型自动学习全基因组范围的连锁不平衡模式,这在具有高度连锁变异的区域中具有更高的填充准确性得到了证明。我们在人类千人基因组计划和非人类基因组上的填充结果表明,STICI可以实现与最先进的基因型填充方法相当的高填充准确性,并且还具有填充多等位基因变异和各种类型遗传变异的额外能力。STICI可以使用自我监督自动针对任何基因组集合进行训练。此外,STICI在不需要对非人类基因组集合中的潜在模式有任何特殊预设的情况下表现出色,这表明STICI在任何物种中填充缺失基因型的适应性和应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/82d6/11785734/3954a7874ce0/41467_2025_56273_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验