基于集成学习的具有序列表征的驱动同义突变预测器

Ensemble learning-based predictor for driver synonymous mutation with sequence representation.

作者信息

Bi Chuanmei, Shi Yong, Xia Junfeng, Liang Zhen, Wu Zhiqiang, Xu Kai, Cheng Na

机构信息

School of Biomedical Engineering, Anhui Medical University, Hefei, China.

Institutes of Physical Science and Information Technology, Anhui University, Hefei, China.

出版信息

PLoS Comput Biol. 2025 Jan 6;21(1):e1012744. doi: 10.1371/journal.pcbi.1012744. eCollection 2025 Jan.

Abstract

Synonymous mutations, once considered neutral, are now understood to have significant implications for a variety of diseases, particularly cancer. It is indispensable to identify these driver synonymous mutations in human cancers, yet current methods are constrained by data limitations. In this study, we initially investigate the impact of sequence-based features, including DNA shape, physicochemical properties and one-hot encoding of nucleotides, and deep learning-derived features from pre-trained chemical molecule language models based on BERT. Subsequently, we propose EPEL, an effect predictor for synonymous mutations employing ensemble learning. EPEL combines five tree-based models and optimizes feature selection to enhance predictive accuracy. Notably, the incorporation of DNA shape features and deep learning-derived features from chemical molecule represents a pioneering effect in assessing the impact of synonymous mutations in cancer. Compared to existing state-of-the-art methods, EPEL demonstrates superior performance on the independent test dataset. Furthermore, our analysis reveals a significant correlation between effect scores and patient outcomes across various cancer types. Interestingly, while deep learning methods have shown promise in other fields, their DNA sequence representations do not significantly enhance the identification of driver synonymous mutations in this study. Overall, we anticipate that EPEL will facilitate researchers to more precisely target driver synonymous mutations. EPEL is designed with flexibility, allowing users to retrain the prediction model and generate effect scores for synonymous mutations in human cancers. A user-friendly web server for EPEL is available at http://ahmu.EPEL.bio/.

摘要

同义突变曾被认为是中性的,现在人们认识到它们对多种疾病,尤其是癌症具有重大影响。在人类癌症中识别这些驱动性同义突变是必不可少的,但目前的方法受到数据限制的约束。在本研究中,我们首先研究了基于序列的特征的影响,包括DNA形状、物理化学性质和核苷酸的独热编码,以及基于BERT的预训练化学分子语言模型衍生的深度学习特征。随后,我们提出了EPEL,一种采用集成学习的同义突变效应预测器。EPEL结合了五个基于树的模型,并优化了特征选择以提高预测准确性。值得注意的是,纳入DNA形状特征和来自化学分子的深度学习衍生特征在评估癌症中同义突变的影响方面具有开创性作用。与现有的最先进方法相比,EPEL在独立测试数据集上表现出卓越的性能。此外,我们的分析揭示了效应得分与各种癌症类型患者预后之间的显著相关性。有趣的是,虽然深度学习方法在其他领域已显示出前景,但在本研究中它们的DNA序列表示并没有显著增强对驱动性同义突变的识别。总体而言,我们预计EPEL将有助于研究人员更精确地靶向驱动性同义突变。EPEL的设计具有灵活性,允许用户重新训练预测模型并生成人类癌症中同义突变的效应得分。可通过http://ahmu.EPEL.bio/获得一个用户友好的EPEL网络服务器。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f4c8/11737855/81d690f9ddc7/pcbi.1012744.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索