IEEE J Biomed Health Inform. 2024 Feb;28(2):1144-1151. doi: 10.1109/JBHI.2023.3343075. Epub 2024 Feb 5.
Accurate identification of driver mutations is crucial in genetic studies of human cancers. While numerous cancer driver missense mutations have been identified, research into potential cancer drivers for synonymous mutations has shown limited success to date. Here, we developed a novel machine learning framework, epSMic, for predicting cancer driver synonymous mutations. epSMic employs an iterative feature representation scheme that facilitates the learning of discriminative features from various sequential models in a supervised iterative mode. We constructed the benchmark datasets and encoded the embedding sequence, physicochemical property, and basic information such as conservation and splicing feature. The evaluation results on benchmark test datasets demonstrate that epSMic outperforms existing methods, making it a valuable tool for researchers in identifying functional synonymous mutations in cancer. We hope epSMic can enable researchers to concentrate on synonymous mutations that have a functional impact on cancer.
准确识别驱动突变在人类癌症的遗传研究中至关重要。虽然已经鉴定出许多癌症驱动错义突变,但迄今为止,对潜在癌症驱动同义突变的研究取得的成果有限。在这里,我们开发了一种新的机器学习框架 epSMic,用于预测癌症驱动同义突变。epSMic 采用迭代特征表示方案,可在监督迭代模式下从各种序列模型中学习有鉴别力的特征。我们构建了基准数据集,并对嵌入序列、理化性质和保守性、剪接特征等基本信息进行了编码。在基准测试数据集上的评估结果表明,epSMic 优于现有方法,使其成为研究人员识别癌症中功能同义突变的有价值工具。我们希望 epSMic 能够使研究人员专注于对癌症有功能影响的同义突变。