Zhai Jingjing, Zhang Yuzhou, Zhang Chujun, Yin Xiaotong, Song Minggui, Tang Chenglong, Ding Pengjun, Li Zenglin, Ma Chuang
State Key Laboratory for Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi, 712100, China.
Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi, 712100, China.
Adv Sci (Weinh). 2025 Aug;12(30):e03135. doi: 10.1002/advs.202503135. Epub 2025 May 24.
The precise prediction of transcription factor binding sites (TFBSs) is crucial in understanding gene regulation. In this study, deepTFBS, a comprehensive deep learning (DL) framework that builds a robust DNA language model of TF binding grammar for accurately predicting TFBSs within and across plant species is presented. Taking advantages of multi-task DL and transfer learning, deepTFBS is capable of leveraging the knowledge learned from large-scale TF binding profiles to enhance the prediction of TFBSs under small-sample training and cross-species prediction tasks. When tested using available information on 359 Arabidopsis TFs, deepTFBS outperformed previously described prediction strategies, including position weight matrix, deepSEA and DanQ, with a 244.49%, 49.15%, and 23.32% improvement of the area under the precision-recall curve (PRAUC), respectively. Further cross-species prediction of TFBS in wheat showed that deepTFBS yielded a significant PRAUC improvement of 30.6% over these three baseline models. deepTFBS can also utilize information from gene conservation and binding motifs, enabling efficient TFBS prediction in species where experimental data availability is limited. A case study, focusing on the WUSCHEL (WUS) transcription factor, illustrated the potential use of deepTFBS in cross-species applications, in our example between Arabidopsis and wheat. deepTFBS is publically available at https://github.com/cma2015/deepTFBS.
转录因子结合位点(TFBSs)的精确预测对于理解基因调控至关重要。在本研究中,我们提出了deepTFBS,这是一个全面的深度学习(DL)框架,它构建了一个强大的TF结合语法DNA语言模型,用于准确预测植物物种内和跨物种的TFBSs。利用多任务深度学习和迁移学习,deepTFBS能够利用从大规模TF结合谱中学到的知识,在小样本训练和跨物种预测任务中增强TFBSs的预测。当使用359个拟南芥TF的可用信息进行测试时,deepTFBS优于先前描述的预测策略,包括位置权重矩阵、deepSEA和DanQ,精确召回率曲线下面积(PRAUC)分别提高了244.49%、49.15%和23.32%。对小麦中TFBS的进一步跨物种预测表明,deepTFBS比这三个基线模型的PRAUC有显著提高,提高了30.6%。deepTFBS还可以利用基因保守性和结合基序的信息,在实验数据有限的物种中实现高效的TFBS预测。一个以WUSCHEL(WUS)转录因子为重点的案例研究,说明了deepTFBS在跨物种应用中的潜在用途,在我们的例子中是拟南芥和小麦之间。deepTFBS可在https://github.com/cma2015/deepTFBS上公开获取。