Nasrin Taslima, Hoque Mehboob, Ali Safdar
Clinical and Applied Genomics (CAG) Laboratory, Department of Biological Sciences, Aliah University, Kolkata, India.
Applied Biochemistry Laboratory, Department of Biological Sciences, Aliah University, Kolkata, India.
Gene. 2023 Jan 30;851:147037. doi: 10.1016/j.gene.2022.147037. Epub 2022 Nov 8.
Microsatellites or Simple Sequence Repeats (SSRs) are short motif repeat sequences constituting the most hypervariable regions of genomes. Present study extracts and analyzes the SSRs from genomes of 21 virophages. Genomic sequences were retrieved from NCBI and the microsatellite data was extracted through MISA web server. Phylogenetic analysis was performed by using MAFFT and MEGAX as per standardized protocols. The virophages have a circular/linear ds DNA genome of ~17-30 kb size. The GC% of genomes ranged from 26.8 (PSAV13) to 51.1 (PSAV12). A total of 3664 SSRs and 488 cSSR were observed with an average incidence of 174 and 23 respectively. The total SSR incidence in a genome ranged from 120 (PSAV19) to 264 (PSAV14). The cSSR (compound SSR) incidence ranged from 8 (PSAV12) to 47 (PSAV14). Mono-nucleotide repeats are the most incident microsatellites (1129 SSRs) followed by di-nucleotide (1036 SSRs) and tri-nucleotide repeats (368 SSRs). However, the same is not true for individual genomes. There are 14, 16 and 17 genomes which have no incidence of tetra-, penta- and hexa-nucleotide repeats respectively. Mono 'A' repeats having the maximum representation (average ~33 per genome) in mono-nucleotide repeats. For the di-nucleotide repeats, AT/TA motif had the highest frequency (average ~30) distantly followed by AG/GA; and CT/TC (average 5.6 & 5.5 respectively). A total of 1946 SSRs (76%) were found in the coding region. All genomes had a higher SSR density in non-coding as compared to the coding region. There are fifteen genomes which have at least one gene with no SSR. A total of 41 cSSRs with incidence across minimum of two virophages was observed. There were 12 cSSRs which had multiple presence within the same genome. The heat map of the genomes on one hand corroborates the phylogenetic tree with similar sequences (PSAV2, PSAV5, PSAV6, PSAV17 and PSAV18) being positioned together in the phylogenetic analysis while on the other hand it also highlights the diversity of the studied sequences. The conservation of cSSRs across multiple virophages highlights their potential as biomarkers.
微卫星或简单序列重复(SSRs)是构成基因组中最具高变异性区域的短基序重复序列。本研究从21种噬病毒体的基因组中提取并分析了微卫星。基因组序列从NCBI获取,微卫星数据通过MISA网络服务器提取。按照标准化方案,使用MAFFT和MEGAX进行系统发育分析。噬病毒体具有大小约为17 - 30 kb的环状/线性双链DNA基因组。基因组的GC%范围从26.8(PSAV13)到51.1(PSAV12)。共观察到3664个微卫星和488个复合微卫星(cSSR),平均发生率分别为174和23。一个基因组中的微卫星总发生率范围从120(PSAV19)到264(PSAV14)。复合微卫星(cSSR)的发生率范围从8(PSAV12)到47(PSAV14)。单核苷酸重复是最常见的微卫星(1129个微卫星),其次是二核苷酸(1036个微卫星)和三核苷酸重复(368个微卫星)。然而,对于单个基因组并非如此。分别有14、16和17个基因组没有四核苷酸、五核苷酸和六核苷酸重复的情况。在单核苷酸重复中,单“A”重复的占比最大(每个基因组平均约33个)。对于二核苷酸重复,AT/TA基序的频率最高(平均约30),其次是AG/GA;以及CT/TC(分别平均为5.6和5.5)。在编码区共发现1946个微卫星(76%)。与编码区相比,所有基因组在非编码区的微卫星密度更高。有15个基因组至少有一个没有微卫星的基因。共观察到41个在至少两种噬病毒体中出现的复合微卫星(cSSR)。有12个复合微卫星(cSSR)在同一基因组中多次出现。一方面,基因组的热图证实了系统发育树,相似序列(PSAV2、PSAV5、PSAV6、PSAV17和PSAV18)在系统发育分析中位于一起,另一方面,它也突出了所研究序列的多样性。多个噬病毒体中复合微卫星(cSSR)的保守性突出了它们作为生物标志物的潜力。