Suppr超能文献

iDNA-OpenPrompt:用于识别DNA甲基化的OpenPrompt学习模型。

iDNA-OpenPrompt: OpenPrompt learning model for identifying DNA methylation.

作者信息

Yu Xia, Ren Jia, Long Haixia, Zeng Rao, Zhang Guoqiang, Bilal Anas, Cui Yani

机构信息

School of Information and Communication Engineering, Hainan University, Haikou, Hainan, China.

School of Information Science and Technology, Hainan Normal University, Haikou, Hainan, China.

出版信息

Front Genet. 2024 Apr 16;15:1377285. doi: 10.3389/fgene.2024.1377285. eCollection 2024.

Abstract

DNA methylation is a critical epigenetic modification involving the addition of a methyl group to the DNA molecule, playing a key role in regulating gene expression without changing the DNA sequence. The main difficulty in identifying DNA methylation sites lies in the subtle and complex nature of methylation patterns, which may vary across different tissues, developmental stages, and environmental conditions. Traditional methods for methylation site identification, such as bisulfite sequencing, are typically labor-intensive, costly, and require large amounts of DNA, hindering high-throughput analysis. Moreover, these methods may not always provide the resolution needed to detect methylation at specific sites, especially in genomic regions that are rich in repetitive sequences or have low levels of methylation. Furthermore, current deep learning approaches generally lack sufficient accuracy. This study introduces the iDNA-OpenPrompt model, leveraging the novel OpenPrompt learning framework. The model combines a prompt template, prompt verbalizer, and Pre-trained Language Model (PLM) to construct the prompt-learning framework for DNA methylation sequences. Moreover, a DNA vocabulary library, BERT tokenizer, and specific label words are also introduced into the model to enable accurate identification of DNA methylation sites. An extensive analysis is conducted to evaluate the predictive, reliability, and consistency capabilities of the iDNA-OpenPrompt model. The experimental outcomes, covering 17 benchmark datasets that include various species and three DNA methylation modifications (4mC, 5hmC, 6mA), consistently indicate that our model surpasses outstanding performance and robustness approaches.

摘要

DNA甲基化是一种关键的表观遗传修饰,涉及在DNA分子上添加一个甲基基团,在不改变DNA序列的情况下调节基因表达中发挥关键作用。识别DNA甲基化位点的主要困难在于甲基化模式的微妙和复杂性,其可能在不同组织、发育阶段和环境条件下有所不同。传统的甲基化位点识别方法,如亚硫酸氢盐测序,通常劳动强度大、成本高,且需要大量DNA,阻碍了高通量分析。此外,这些方法可能并不总能提供检测特定位点甲基化所需的分辨率,尤其是在富含重复序列或甲基化水平较低的基因组区域。此外,当前的深度学习方法通常缺乏足够的准确性。本研究引入了iDNA-OpenPrompt模型,利用了新颖的OpenPrompt学习框架。该模型结合了提示模板、提示语言器和预训练语言模型(PLM)来构建DNA甲基化序列的提示学习框架。此外,还将一个DNA词汇库、BERT分词器和特定的标签词引入模型,以实现对DNA甲基化位点的准确识别。进行了广泛的分析以评估iDNA-OpenPrompt模型的预测、可靠性和一致性能力。涵盖17个基准数据集(包括各种物种和三种DNA甲基化修饰(4mC、5hmC、6mA))的实验结果一致表明,我们的模型超越了出色的性能和稳健性方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e1ac/11058834/9378354d6b56/fgene-15-1377285-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验