Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.
Key Laboratory for Animal Disease Resistance Nutrition of the Ministry of Education, Animal Nutrition Institute, Sichuan Agricultural University, Chengdu, China.
Bioinformatics. 2019 May 1;35(9):1469-1477. doi: 10.1093/bioinformatics/bty827.
Transcription termination is an important regulatory step of gene expression. If there is no terminator in gene, transcription could not stop, which will result in abnormal gene expression. Detecting such terminators can determine the operon structure in bacterial organisms and improve genome annotation. Thus, accurate identification of transcriptional terminators is essential and extremely important in the research of transcription regulations.
In this study, we developed a new predictor called 'iTerm-PseKNC' based on support vector machine to identify transcription terminators. The binomial distribution approach was used to pick out the optimal feature subset derived from pseudo k-tuple nucleotide composition (PseKNC). The 5-fold cross-validation test results showed that our proposed method achieved an accuracy of 95%. To further evaluate the generalization ability of 'iTerm-PseKNC', the model was examined on independent datasets which are experimentally confirmed Rho-independent terminators in Escherichia coli and Bacillus subtilis genomes. As a result, all the terminators in E. coli and 87.5% of the terminators in B. subtilis were correctly identified, suggesting that the proposed model could become a powerful tool for bacterial terminator recognition.
For the convenience of most of wet-experimental researchers, the web-server for 'iTerm-PseKNC' was established at http://lin-group.cn/server/iTerm-PseKNC/, by which users can easily obtain their desired result without the need to go through the detailed mathematical equations involved.
转录终止是基因表达的一个重要调控步骤。如果基因中没有终止子,转录就无法停止,这将导致异常的基因表达。检测这些终止子可以确定细菌生物中的操纵子结构,并改善基因组注释。因此,准确识别转录终止子在转录调控研究中是必不可少的,也是极其重要的。
在这项研究中,我们开发了一种新的基于支持向量机的预测器,称为“iTerm-PseKNC”,用于识别转录终止子。二项式分布方法用于从伪 k- 元核苷酸组成(PseKNC)中选择最佳特征子集。5 倍交叉验证测试结果表明,我们提出的方法的准确率为 95%。为了进一步评估“iTerm-PseKNC”的泛化能力,我们在独立数据集上对该模型进行了检验,这些数据集是大肠杆菌和枯草芽孢杆菌基因组中经实验证实的 Rho 非依赖性终止子。结果,大肠杆菌中的所有终止子和枯草芽孢杆菌中 87.5%的终止子都被正确识别,这表明所提出的模型可以成为细菌终止子识别的有力工具。
为了方便大多数湿实验研究人员,我们在 http://lin-group.cn/server/iTerm-PseKNC/ 上建立了“iTerm-PseKNC”的网络服务器,用户可以轻松地获得他们所需的结果,而无需经历涉及的详细数学方程。