School of Artificial Intelligence, Hangzhou Dianzi University, Hangzhou, 310018, P.R. China.
School of Biomedical Sciences, The Chinese University of Hong Kong, Hong Kong SAR, China.
BMC Genomics. 2024 Oct 31;25(1):1021. doi: 10.1186/s12864-024-10951-6.
Bacterial small regulatory RNA (sRNA) plays a crucial role in cell metabolism and could be used as a new potential drug target in the treatment of pathogen-induced disease. However, experimental methods for identifying sRNAs still require a large investment of human and material resources.
In this study, we propose a novel sRNA prediction model called sRNAdeep based on the DistilBERT feature extraction and TextCNN methods. The sRNA and non-sRNA sequences of bacteria were considered as sentences and then fed into a composite model consisting of deep learning models to evaluate classification performance.
By filtering sRNAs from BSRD database, we obtained a validation dataset comprised of 2438 positive and 4730 negative samples. The benchmark experiments showed that sRNAdeep displayed better performance in the various indexes compared to previous sRNA prediction tools. By applying our tool to Mycobacterium tuberculosis (MTB) genome, we have identified 21 sRNAs within the intergenic and intron regions. A set of 272 targeted genes regulated by these sRNAs were also captured in MTB. The coding proteins of two genes (lysX and icd1) are implicated in drug response, with significant active sites related to drug resistance mechanisms of MTB.
In conclusion, our newly developed sRNAdeep can help researchers identify bacterial sRNAs more precisely and can be freely available from https://github.com/pyajagod/sRNAdeep.git .
细菌小调控 RNA(sRNA)在细胞代谢中起着至关重要的作用,可作为治疗病原体诱导疾病的新的潜在药物靶点。然而,鉴定 sRNA 的实验方法仍然需要大量的人力和物力投入。
在这项研究中,我们提出了一种名为 sRNAdeep 的新型 sRNA 预测模型,该模型基于 DistilBERT 特征提取和 TextCNN 方法。将细菌的 sRNA 和非 sRNA 序列视为句子,并将其输入到由深度学习模型组成的组合模型中,以评估分类性能。
通过从 BSRD 数据库中筛选 sRNA,我们获得了一个由 2438 个阳性和 4730 个阴性样本组成的验证数据集。基准实验表明,sRNAdeep 在各项指标上的性能均优于以前的 sRNA 预测工具。通过将我们的工具应用于结核分枝杆菌(MTB)基因组,我们在基因间和内含子区域内鉴定出了 21 个 sRNA。还捕获了这些 sRNA 调控的 272 个靶向基因。两个基因(lysX 和 icd1)的编码蛋白与 MTB 的药物反应有关,具有与 MTB 耐药机制相关的显著活性位点。
总之,我们新开发的 sRNAdeep 可以帮助研究人员更准确地识别细菌 sRNA,并可从 https://github.com/pyajagod/sRNAdeep.git 免费获得。