Provatas Kimonas, Chantzi Nikol, Patsakis Michail, Nayak Akshatha, Mouratidis Ioannis, Georgakopoulos-Soares Ilias
Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA.
Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, USA.
Comput Struct Biotechnol J. 2024 Oct 26;23:3817-3826. doi: 10.1016/j.csbj.2024.10.041. eCollection 2024 Dec.
Short tandem repeats (STRs) are widespread, repetitive elements, with a number of biological functions and are among the most rapidly mutating regions in the genome. Their distribution varies significantly between taxonomic groups in the tree of life and are highly polymorphic within the human population. Advances in sequencing technologies coupled with decreasing costs have enabled the generation of an ever-growing number of complete genomes. Additionally, the arrival of accurate long reads has facilitated the generation of Telomere-to-Telomere (T2T) assemblies of complete genomes. Nevertheless, there is no comprehensive database that encompasses the STRs found per genome across different organisms and for different human genomes across diverse ancestries. Here we introduce Microsatellites Explorer, a database of STRs found in the genomes of 117,253 organisms across all major taxonomic groups, 15 T2T genome assemblies of different organisms, and 94 human haplotypes from the human pangenome. The database currently hosts 406,758,798 STR sequences, serving as a centralized user-friendly repository to perform searches, interactive visualizations, and download existing STR data for independent analysis. Microsatellites Explorer is implemented as a web-portal for browsing, analyzing and downloading STR data. Microsatellites Explorer is publicly available at https://www.microsatellitesexplorer.com.
短串联重复序列(STRs)是广泛存在的重复元件,具有多种生物学功能,是基因组中突变最快的区域之一。它们在生命之树的不同分类群之间分布差异显著,在人类群体中具有高度多态性。测序技术的进步以及成本的降低使得完整基因组的数量不断增加。此外,精确长读长测序的出现促进了完整基因组的端粒到端粒(T2T)组装。然而,目前还没有一个综合数据库能涵盖不同生物体每个基因组中发现的STRs,以及不同祖先的不同人类基因组中的STRs。在此,我们推出微卫星浏览器(Microsatellites Explorer),这是一个包含117253个来自所有主要分类群的生物体基因组、15个不同生物体的T2T基因组组装以及94个人类泛基因组单倍型中发现的STRs的数据库。该数据库目前存储了406758798个STR序列,作为一个集中的用户友好型存储库,用于进行搜索、交互式可视化以及下载现有STR数据进行独立分析。微卫星浏览器以网络门户的形式实现,用于浏览、分析和下载STR数据。微卫星浏览器可在https://www.microsatellitesexplorer.com上公开获取。