Liu Xiao, Xu Yuqiao, Luo Yachuan, Teng Li
School of Microelectronics and Communication Engineering, Chongqing University, 174 Shapingba District, Chongqing, 400044, China.
Bioprocess Biosyst Eng. 2022 May;45(5):955-967. doi: 10.1007/s00449-022-02716-w. Epub 2022 Mar 13.
Promoters contribute to research in the context of many diseases, such as coronary heart disease, diabetes and tumors, and one fundamental task is to identify promoters. Deep learning is widely used in the study of promoter sequence recognition. Although deep models have fast and accurate recognition capabilities, they are also limited by their reliance on large amounts of high-quality data. Therefore, we performed transfer learning on a typical deep network based on residual ideas, called a deep residual network (ResNet), to solve the problem of a deep network's high dependence on large amounts of data in the process of promoter prediction. We used binary one-hot encoding to represent the promoter and took advantage of ResNet to extract feature representations from organisms with a large amount of promoter data. Then, we transferred the learned structural parameters to target organisms with insufficient promoter data to improve the generalization performance of ResNet in target organisms. We evaluated the promoter datasets of four organisms (Bacillus subtilis, Escherichia coli, Saccharomyces cerevisiae and Drosophila melanogaster). The experimental results showed that the AUCs of ResNet's promoter prediction after deep transfer were 0.8537 and 0.8633, which increased by 0.1513 and 0.1376 in prokaryotes and eukaryotes, respectively.
启动子在许多疾病(如冠心病、糖尿病和肿瘤)的研究中发挥着作用,而一项基本任务就是识别启动子。深度学习在启动子序列识别研究中被广泛应用。尽管深度模型具有快速且准确的识别能力,但它们也受到对大量高质量数据依赖的限制。因此,我们基于残差思想在一个典型的深度网络(称为深度残差网络,即ResNet)上进行迁移学习,以解决深度网络在启动子预测过程中对大量数据的高度依赖问题。我们使用二进制独热编码来表示启动子,并利用ResNet从具有大量启动子数据的生物体中提取特征表示。然后,我们将学习到的结构参数转移到启动子数据不足的目标生物体上,以提高ResNet在目标生物体中的泛化性能。我们评估了四种生物体(枯草芽孢杆菌、大肠杆菌、酿酒酵母和黑腹果蝇)的启动子数据集。实验结果表明,深度迁移后ResNet启动子预测的AUC分别为0.8537和0.8633,在原核生物和真核生物中分别提高了0.1513和0.1376。