D'Auria Giuseppe, Pushker Ravindra, Rodriguez-Valera Francisco
Evolutionary Genomics Group and Division de Microbiologia, Universidad Miguel Hernandez Campus de San Juan, 03550 San Juan de Alicante, Spain.
Bioinformatics. 2006 Mar 1;22(5):527-31. doi: 10.1093/bioinformatics/btk033. Epub 2006 Jan 10.
Lately the use of 16S-23S Intergenic Transcribed Spacer (ITS) sequences for bacterial typing purposes has increased. The presence of conserved regions like tRNA genes or boxes together with hypervariable regions allows performing intraspecific discrimination of very close bacterial strains. On the other hand this mosaic of variability makes the ITS a sequence difficult to analyze and compare.
A software to study ITSs by a Word Count based System (IWoCS) is proposed. A large dataset of ITS was created (comprising 7355 sequences). A database indicating all the occurrences of possible n-mers (tags), describing each ITS sequence, was created (with n going from 5 to 13) including 32 061 819 entries. The database allows to analyze ITS sequences submitted by users using a web-based interface. The abundance in the database of each n-mer is given in a one-base sliding frame. A dominance plot reflects how common the tags are within different taxonomic levels. The obtained profile identifies highly repeated tags as evolutionarily conserved regions (like tRNA or boxes) or low frequency tags as regions specifically associated to taxonomic groups. The study of the dominance and abundance profiles combined with the taxonomy reports provides a novel tool for the use of the ITS in bacteria typing and identification.
The database is freely accessible at http://egg.umh.es/iwocs/.
近来,用于细菌分型目的的16S - 23S基因间转录间隔区(ITS)序列的使用有所增加。tRNA基因或框等保守区域与高变区的存在使得能够对非常相近的细菌菌株进行种内鉴别。另一方面,这种可变区的镶嵌结构使得ITS序列难以分析和比较。
提出了一种基于词频统计系统(IWoCS)来研究ITS的软件。创建了一个大型ITS数据集(包含7355个序列)。创建了一个数据库,该数据库表明了描述每个ITS序列的所有可能的n聚体(标签)的出现情况(n从5到1