Suppr超能文献

iDNA-ABT:具有自适应特征和转导信息最大化的先进深度学习模型,用于检测 DNA 甲基化。

iDNA-ABT: advanced deep learning model for detecting DNA methylation with adaptive features and transductive information maximization.

机构信息

School of Software, Shandong University, Jinan, China.

Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China.

出版信息

Bioinformatics. 2021 Dec 11;37(24):4603-4610. doi: 10.1093/bioinformatics/btab677.

Abstract

MOTIVATION

DNA methylation plays an important role in epigenetic modification, the occurrence, and the development of diseases. Therefore, identification of DNA methylation sites is critical for better understanding and revealing their functional mechanisms. To date, several machine learning and deep learning methods have been developed for the prediction of different DNA methylation types. However, they still highly rely on manual features, which can largely limit the high-latent information extraction. Moreover, most of them are designed for one specific DNA methylation type, and therefore cannot predict multiple methylation sites in multiple species simultaneously. In this study, we propose iDNA-ABT, an advanced deep learning model that utilizes adaptive embedding based on Bidirectional Encoder Representations from Transformers (BERT) together with transductive information maximization (TIM).

RESULTS

Benchmark results show that our proposed iDNA-ABT can automatically and adaptively learn the distinguishing features of biological sequences from multiple species, and thus perform significantly better than the state-of-the-art methods in predicting three different DNA methylation types. In addition, TIM loss is proven to be effective in dichotomous tasks via the comparison experiment. Furthermore, we verify that our features have strong adaptability and robustness to different species through comparison of adaptive embedding and six handcrafted feature encodings. Importantly, our model shows great generalization ability in different species, demonstrating that our model can adaptively capture the cross-species differences and improve the predictive performance. For the convenient use of our method, we further established an online webserver as the implementation of the proposed iDNA-ABT.

AVAILABILITY AND IMPLEMENTATION

Our proposed iDNA-ABT and data are freely accessible via http://server.wei-group.net/iDNA_ABT and our source codes are available for downloading in the GitHub repository (https://github.com/YUYING07/iDNA_ABT).

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

DNA 甲基化在表观遗传修饰、疾病的发生和发展中起着重要作用。因此,识别 DNA 甲基化位点对于更好地理解和揭示其功能机制至关重要。迄今为止,已经开发了几种机器学习和深度学习方法来预测不同的 DNA 甲基化类型。然而,它们仍然高度依赖于人工特征,这在很大程度上限制了高潜在信息的提取。此外,它们大多是为特定的 DNA 甲基化类型设计的,因此不能同时预测多个物种中的多个甲基化位点。在这项研究中,我们提出了 iDNA-ABT,这是一种先进的深度学习模型,它利用基于 Transformer 的双向编码器表示(BERT)的自适应嵌入以及转导信息最大化(TIM)。

结果

基准结果表明,我们提出的 iDNA-ABT 可以自动和自适应地从多个物种的生物序列中学习区分特征,因此在预测三种不同的 DNA 甲基化类型方面明显优于最先进的方法。此外,通过对比实验证明 TIM 损失在二分类任务中是有效的。此外,我们通过比较自适应嵌入和六个手工特征编码,验证了我们的特征对不同物种具有很强的适应性和稳健性。重要的是,我们的模型在不同物种中表现出很强的泛化能力,表明我们的模型能够自适应地捕捉跨物种差异并提高预测性能。为了方便使用我们的方法,我们进一步建立了一个在线网络服务器作为所提出的 iDNA-ABT 的实现。

可用性和实现

我们提出的 iDNA-ABT 和数据可通过 http://server.wei-group.net/iDNA_ABT 免费访问,我们的源代码可在 GitHub 存储库(https://github.com/YUYING07/iDNA_ABT)中下载。

补充信息

补充数据可在《生物信息学》在线获取。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验