School of Software, Shandong University, Jinan 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China.
School of Software, Shandong University, Jinan 250101, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China.
Comput Biol Med. 2023 Sep;164:107260. doi: 10.1016/j.compbiomed.2023.107260. Epub 2023 Jul 21.
The promoter region, positioned proximal to the transcription start sites, exerts control over the initiation of gene transcription by modulating the interaction with RNA polymerase. Consequently, the accurate recognition of promoter regions represents a critical focus within the bioinformatics domain. Although some methods leveraging pre-trained language models (PLMs) for promoter prediction have been proposed, the full potential of such PLMs remains largely untapped. In this study, we introduce PLPMpro, a model that capitalizes on prompt-learning and the pre-trained language model to enhance the prediction of promoter sequences. PLPMpro effectively harnesses the prompt learning paradigm to fully exploit the inherent capacities of the PLM, resulting in substantial improvements in prediction performance. Experiment results unequivocally demonstrate the efficacy of prompt learning in bolstering the capabilities of the pre-trained model. Consequently, PLPMpro surpasses both typical pre-trained model-based methods for promoter prediction and typical deep learning methods. Furthermore, we conduct various experiments to meticulously scrutinize the effects of different prompt learning settings and different numbers of soft modules on the model performance. More importantly, the interpretation experiment reveals that the pre-trained model captures biological semantics. Collectively, this research imparts a novel perspective on the optimal utilization of PLMs for addressing biological problems.
启动子区域位于转录起始位点的近端,通过调节与 RNA 聚合酶的相互作用来控制基因转录的起始。因此,准确识别启动子区域是生物信息学领域的一个关键焦点。尽管已经提出了一些利用预训练语言模型 (PLM) 进行启动子预测的方法,但这些 PLM 的全部潜力在很大程度上尚未得到开发。在这项研究中,我们引入了 PLPMpro,这是一种利用提示学习和预训练语言模型来增强启动子序列预测的模型。PLPMpro 有效地利用了提示学习范例,充分利用了 PLM 的固有能力,从而显著提高了预测性能。实验结果毫不含糊地证明了提示学习在增强预训练模型能力方面的有效性。因此,PLPMpro 优于典型的基于预训练模型的启动子预测方法和典型的深度学习方法。此外,我们进行了各种实验来仔细研究不同的提示学习设置和不同数量的软模块对模型性能的影响。更重要的是,解释实验表明,预训练模型捕获了生物学语义。总的来说,这项研究为利用 PLM 解决生物学问题提供了新的视角。