Suppr超能文献

深度学习的预训练-再训练策略可改善细胞特异性增强子预测。

A Pretraining-Retraining Strategy of Deep Learning Improves Cell-Specific Enhancer Predictions.

作者信息

Niu Xiaohui, Yang Kun, Zhang Ge, Yang Zhiquan, Hu Xuehai

机构信息

College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China.

出版信息

Front Genet. 2020 Jan 8;10:1305. doi: 10.3389/fgene.2019.01305. eCollection 2019.

Abstract

Deciphering the code of cis-regulatory element (CRE) is one of the core issues of today's biology. Enhancers are distal CREs and play significant roles in gene transcriptional regulation. Although identifications of enhancer locations across the whole genome [discriminative enhancer predictions (DEP)] is necessary, it is more important to predict in which specific cell or tissue types, they will be activated and functional [tissue-specific enhancer predictions (TSEP)]. Although existing deep learning models achieved great successes in DEP, they cannot be directly employed in TSEP because a specific cell or tissue type only has a limited number of available enhancer samples for training. Here, we first adopted a reported deep learning architecture and then developed a novel training strategy named "pretraining-retraining strategy" (PRS) for TSEP by decomposing the whole training process into two successive stages: a pretraining stage is designed to train with the whole enhancer data for performing DEP, and a retraining strategy is then designed to train with tissue-specific enhancer samples based on the trained pretraining model for making TSEP. As a result, PRS is found to be valid for DEP with an AUC of 0.922 and a GM (geometric mean) of 0.696, when testing on a larger-scale FANTOM5 enhancer dataset a five-fold cross-validation. Interestingly, based on the trained pretraining model, a new finding is that only additional twenty epochs are needed to complete the retraining process on testing 23 specific tissues or cell lines. For TSEP tasks, PRS achieved a mean GM of 0.806 which is significantly higher than 0.528 of gkm-SVM, an existing mainstream method for CRE predictions. Notably, PRS is further proven superior to other two state-of-the-art methods: DEEP and BiRen. In summary, PRS has employed useful ideas from the domain of transfer learning and is a reliable method for TSEPs.

摘要

破译顺式调控元件(CRE)的编码是当今生物学的核心问题之一。增强子是远端CRE,在基因转录调控中发挥重要作用。虽然全基因组范围内增强子位置的识别[鉴别性增强子预测(DEP)]是必要的,但更重要的是预测它们将在哪些特定的细胞或组织类型中被激活并发挥功能[组织特异性增强子预测(TSEP)]。尽管现有的深度学习模型在DEP方面取得了巨大成功,但它们不能直接用于TSEP,因为特定的细胞或组织类型只有有限数量的可用增强子样本用于训练。在这里,我们首先采用了一种已报道的深度学习架构,然后通过将整个训练过程分解为两个连续的阶段,为TSEP开发了一种名为“预训练-再训练策略”(PRS)的新型训练策略:预训练阶段旨在使用整个增强子数据进行训练以执行DEP,然后再训练策略旨在基于训练好的预训练模型使用组织特异性增强子样本进行训练以进行TSEP。结果发现,在一个更大规模的FANTOM5增强子数据集上进行五折交叉验证时,PRS对DEP有效,AUC为0.922,GM(几何平均值)为0.696。有趣的是,基于训练好的预训练模型,一个新发现是在测试23种特定组织或细胞系时,只需要额外的20个epoch就可以完成再训练过程。对于TSEP任务,PRS实现了平均GM为0.806,显著高于现有CRE预测的主流方法gkm-SVM的0.528。值得注意的是,PRS进一步被证明优于其他两种最先进的方法:DEEP和BiRen。总之,PRS借鉴了迁移学习领域的有用思想,是一种用于TSEP的可靠方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/220c/6960260/28b4494a6c95/fgene-10-01305-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验