Suppr超能文献

EpiTEAmDNA:通过迁移学习和集成学习进行序列特征表示,以跨物种识别多种 DNA 表观遗传修饰类型。

EpiTEAmDNA: Sequence feature representation via transfer learning and ensemble learning for identifying multiple DNA epigenetic modification types across species.

机构信息

Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China.

Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China.

出版信息

Comput Biol Med. 2023 Jun;160:107030. doi: 10.1016/j.compbiomed.2023.107030. Epub 2023 May 11.

Abstract

Methylation is a major DNA epigenetic modification for regulating the biological processes without altering the DNA sequence, and multiple types of DNA methylations have been discovered, including 6mA, 5hmC, and 4mC. Multiple computational approaches were developed to automatically identify the DNA methylation residues using machine learning or deep learning algorithms. The machine learning (ML) based methods are difficult to be transferred to the other predicting tasks of the DNA methylation sites using additional knowledge. Deep learning (DL) may facilitate the transfer learning of knowledge from similar tasks, but they are often ineffective on small datasets. This study proposes an integrated feature representation framework EpiTEAmDNA based on the strategies of transfer learning and ensemble learning, which is evaluated on multiple DNA methylation types across 15 species. EpiTEAmDNA integrates convolutional neural network (CNN) and conventional machine learning methods, and shows improved performances than the existing DL-based methods on small datasets when no additional knowledge is available. The experimental data suggests that the EpiTEAmDNA models may be further improved via transfer learning based on additional knowledge. The evaluation experiments on the independent test datasets also suggest that the proposed EpiTEAmDNA framework outperforms the existing models in most prediction tasks of the 3 DNA methylation types across 15 species. The source code, pre-trained global model, and the EpiTEAmDNA feature representation framework are freely available at http://www.healthinformaticslab.org/supp/.

摘要

甲基化是一种主要的 DNA 表观遗传修饰,可调节生物过程而不改变 DNA 序列,并且已经发现了多种类型的 DNA 甲基化,包括 6mA、5hmC 和 4mC。已经开发了多种计算方法,使用机器学习或深度学习算法自动识别 DNA 甲基化残基。基于机器学习 (ML) 的方法难以利用其他知识将其转移到其他 DNA 甲基化位点的预测任务中。深度学习 (DL) 可能有助于从类似任务中转移知识,但它们在小数据集上通常效果不佳。本研究提出了一种基于迁移学习和集成学习策略的整合特征表示框架 EpiTEAmDNA,该框架在 15 个物种的多个 DNA 甲基化类型上进行了评估。EpiTEAmDNA 集成了卷积神经网络 (CNN) 和传统机器学习方法,在没有其他知识的情况下,在小数据集上的性能优于现有的基于 DL 的方法。实验数据表明,EpiTEAmDNA 模型可以通过基于其他知识的迁移学习进一步改进。对独立测试数据集的评估实验也表明,在 15 个物种的 3 种 DNA 甲基化类型的大多数预测任务中,所提出的 EpiTEAmDNA 框架都优于现有模型。源代码、预训练的全局模型和 EpiTEAmDNA 特征表示框架可在 http://www.healthinformaticslab.org/supp/ 上免费获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验