Liu Lifen, Zhang Ge, He Shoupeng, Hu Xuehai
Department of Data Science and Big Data Technology, College of Informatics, Agricultural Bioinformatics Key Laboratory of Hubei Province, Huazhong Agricultural University, 430070 Wuhan, China.
Bioinformatics. 2021 Apr 19;37(2):260-262. doi: 10.1093/bioinformatics/btaa1100.
Both the lack or limitation of experimental data of transcription factor binding sites (TFBS) in plants and the independent evolutions of plant TFs make computational approaches for identifying plant TFBSs lagging behind the relevant human researches. Observing that TFs are highly conserved among plant species, here we first employ the deep convolutional neural network (DeepCNN) to build 265 Arabidopsis TFBS prediction models based on available DAP-seq (DNA affinity purification sequencing) datasets, and then transfer them into homologous TFs in other plants.
DeepCNN not only achieves greater successes on Arabidopsis TFBS predictions when compared with gkm-SVM and MEME but also has learned its known motif for most Arabidopsis TFs as well as cooperative TF motifs with protein-protein interaction evidences as its biological interpretability. Under the idea of transfer learning, trans-species prediction performances on ten TFs of other three plants of Oryza sativa, Zea mays and Glycine max demonstrate the feasibility of current strategy.
The trained 265 Arabidopsis TFBS prediction models were packaged in a Docker image named TSPTFBS, which is freely available on DockerHub at https://hub.docker.com/r/vanadiummm/tsptfbs. Source code and documentation are available on GitHub at: https://github.com/liulifenyf/TSPTFBS.
Supplementary data are available at Bioinformatics online.
植物中转录因子结合位点(TFBS)的实验数据缺乏或有限,以及植物转录因子的独立进化,使得识别植物TFBS的计算方法落后于相关的人类研究。鉴于转录因子在植物物种中高度保守,我们首先利用深度卷积神经网络(DeepCNN),基于可用的DAP-seq(DNA亲和纯化测序)数据集构建了265个拟南芥TFBS预测模型,然后将它们转移到其他植物的同源转录因子上。
与gkm-SVM和MEME相比,DeepCNN在拟南芥TFBS预测方面不仅取得了更大的成功,而且还为大多数拟南芥转录因子学习到了其已知基序,以及具有蛋白质-蛋白质相互作用证据的协同转录因子基序,作为其生物学可解释性。在迁移学习的理念下,对水稻、玉米和大豆这三种其他植物的十个转录因子的跨物种预测性能证明了当前策略的可行性。
经过训练的265个拟南芥TFBS预测模型被打包在一个名为TSPTFBS的Docker镜像中,可在DockerHub上免费获取,网址为https://hub.docker.com/r/vanadiummm/tsptfbs。源代码和文档可在GitHub上获取:https://github.com/liulifenyf/TSPTFBS。
补充数据可在《生物信息学》在线获取。