Wang Yejun, MacKenzie Keith D, White Aaron P
Vaccine and Infectious Disease Organization-International Vaccine Centre, University of Saskatchewan, Saskatoon, SK, Canada.
Department of Microbiology and Immunology, University of Saskatchewan, Saskatoon, SK, Canada.
BMC Genomics. 2015 May 7;16(1):359. doi: 10.1186/s12864-015-1555-8.
As sequencing costs are being lowered continuously, RNA-seq has gradually been adopted as the first choice for comparative transcriptome studies with bacteria. Unlike microarrays, RNA-seq can directly detect cDNA derived from mRNA transcripts at a single nucleotide resolution. Not only does this allow researchers to determine the absolute expression level of genes, but it also conveys information about transcript structure. Few automatic software tools have yet been established to investigate large-scale RNA-seq data for bacterial transcript structure analysis.
In this study, 54 directional RNA-seq libraries from Salmonella serovar Typhimurium (S. Typhimurium) 14028s were examined for potential relationships between read mapping patterns and transcript structure. We developed an empirical method, combined with statistical tests, to automatically detect key transcript features, including transcriptional start sites (TSSs), transcriptional termination sites (TTSs) and operon organization. Using our method, we obtained 2,764 TSSs and 1,467 TTSs for 1331 and 844 different genes, respectively. Identification of TSSs facilitated further discrimination of 215 putative sigma 38 regulons and 863 potential sigma 70 regulons. Combining the TSSs and TTSs with intergenic distance and co-expression information, we comprehensively annotated the operon organization in S. Typhimurium 14028s.
Our results show that directional RNA-seq can be used to detect transcriptional borders at an acceptable resolution of ±10-20 nucleotides. Technical limitations of the RNA-seq procedure may prevent single nucleotide resolution. The automatic transcript border detection methods, statistical models and operon organization pipeline that we have described could be widely applied to RNA-seq studies in other bacteria. Furthermore, the TSSs, TTSs, operons, promoters and unstranslated regions that we have defined for S. Typhimurium 14028s may constitute valuable resources that can be used for comparative analyses with other Salmonella serotypes.
随着测序成本不断降低,RNA测序(RNA-seq)已逐渐成为细菌比较转录组研究的首选方法。与微阵列不同,RNA-seq能够以单核苷酸分辨率直接检测源自mRNA转录本的cDNA。这不仅使研究人员能够确定基因的绝对表达水平,还能传递有关转录本结构的信息。目前用于大规模RNA-seq数据分析以进行细菌转录本结构分析的自动软件工具还很少。
在本研究中,我们检测了来自鼠伤寒沙门氏菌(S. Typhimurium)14028s的54个定向RNA-seq文库,以研究 reads 映射模式与转录本结构之间的潜在关系。我们开发了一种结合统计检验的经验方法,用于自动检测关键转录本特征,包括转录起始位点(TSS)、转录终止位点(TTS)和操纵子组织。使用我们的方法,我们分别为1331个和844个不同基因获得了2764个TSS和1467个TTS。TSS的鉴定有助于进一步区分215个假定的sigma 38调控子和863个潜在的sigma 70调控子。将TSS和TTS与基因间距离和共表达信息相结合,我们全面注释了鼠伤寒沙门氏菌14028s中的操纵子组织。
我们的结果表明,定向RNA-seq可用于以±10 - 20个核苷酸的可接受分辨率检测转录边界。RNA-seq程序的技术限制可能会妨碍单核苷酸分辨率。我们所描述的自动转录本边界检测方法、统计模型和操纵子组织流程可广泛应用于其他细菌的RNA-seq研究。此外,我们为鼠伤寒沙门氏菌14028s定义的TSS、TTS、操纵子、启动子和非翻译区可能构成有价值的资源,可用于与其他沙门氏菌血清型进行比较分析。