Pathogen Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
BMC Genomics. 2011 Feb 18;12:120. doi: 10.1186/1471-2164-12-120.
Small interspersed repeats are commonly found in many bacterial chromosomes. Two families of repeats (BOX and RUP) have previously been identified in the genome of Streptococcus pneumoniae, a nasopharyngeal commensal and respiratory pathogen of humans. However, little is known about the role they play in pneumococcal genetics.
Analysis of the genome of S. pneumoniae ATCC 700669 revealed the presence of a third repeat family, which we have named SPRITE. All three repeats are present at a reduced density in the genome of the closely related species S. mitis. However, they are almost entirely absent from all other streptococci, although a set of elements related to the pneumococcal BOX repeat was identified in the zoonotic pathogen S. suis. In conjunction with information regarding their distribution within the pneumococcal chromosome, this suggests that it is unlikely that these repeats are specialised sequences performing a particular role for the host, but rather that they constitute parasitic elements. However, comparing insertion sites between pneumococcal sequences indicates that they appear to transpose at a much lower rate than IS elements. Some large BOX elements in S. pneumoniae were found to encode open reading frames on both strands of the genome, whilst another was found to form a composite RNA structure with two T box riboswitches. In multiple cases, such BOX elements were demonstrated as being expressed using directional RNA-seq and RT-PCR.
BOX, RUP and SPRITE repeats appear to have proliferated extensively throughout the pneumococcal chromosome during the species' past, but novel insertions are currently occurring at a relatively slow rate. Through their extensive secondary structures, they seem likely to affect the expression of genes with which they are co-transcribed. Software for annotation of these repeats is freely available from ftp://ftp.sanger.ac.uk/pub/pathogens/strep_repeats/.
小型分散重复序列在许多细菌染色体中普遍存在。先前在肺炎链球菌的基因组中已鉴定出两种重复序列家族(BOX 和 RUP),肺炎链球菌是一种鼻咽共生体和人类呼吸道病原体。然而,它们在肺炎球菌遗传学中的作用知之甚少。
对肺炎链球菌 ATCC 700669 基因组的分析显示存在第三种重复家族,我们将其命名为 SPRITE。这三种重复序列在亲缘关系密切的物种草绿色链球菌中的基因组中的密度降低。然而,它们几乎完全不存在于所有其他链球菌中,尽管在人畜共患病病原体猪链球菌中鉴定出了一组与肺炎球菌 BOX 重复序列相关的元件。结合它们在肺炎球菌染色体中的分布信息,这表明这些重复序列不太可能是专门为宿主执行特定功能的序列,而是寄生元件。然而,比较肺炎球菌序列之间的插入位点表明,它们似乎比 IS 元件的转位速度低得多。在肺炎链球菌中发现一些大型 BOX 元件在基因组的两条链上都编码开放阅读框,而另一个则形成具有两个 T 盒核糖体开关的复合 RNA 结构。在多种情况下,使用定向 RNA-seq 和 RT-PCR 证明了这些 BOX 元件的表达。
BOX、RUP 和 SPRITE 重复序列似乎在该物种的过去在肺炎球菌染色体中广泛增殖,但目前新的插入发生的速度相对较慢。通过它们广泛的二级结构,它们似乎很可能影响与其共转录的基因的表达。这些重复序列的注释软件可从 ftp://ftp.sanger.ac.uk/pub/pathogens/strep_repeats/ 免费获得。