School of Software, Shandong University, Jinan 250101, China.
Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China.
Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad352.
NcRNA-encoded small peptides (ncPEPs) have recently emerged as promising targets and biomarkers for cancer immunotherapy. Therefore, identifying cancer-associated ncPEPs is crucial for cancer research. In this work, we propose CoraL, a novel supervised contrastive meta-learning framework for predicting cancer-associated ncPEPs. Specifically, the proposed meta-learning strategy enables our model to learn meta-knowledge from different types of peptides and train a promising predictive model even with few labeled samples. The results show that our model is capable of making high-confidence predictions on unseen cancer biomarkers with only five samples, potentially accelerating the discovery of novel cancer biomarkers for immunotherapy. Moreover, our approach remarkably outperforms existing deep learning models on 15 cancer-associated ncPEPs datasets, demonstrating its effectiveness and robustness. Interestingly, our model exhibits outstanding performance when extended for the identification of short open reading frames derived from ncPEPs, demonstrating the strong prediction ability of CoraL at the transcriptome level. Importantly, our feature interpretation analysis discovers unique sequential patterns as the fingerprint for each cancer-associated ncPEPs, revealing the relationship among certain cancer biomarkers that are validated by relevant literature and motif comparison. Overall, we expect CoraL to be a useful tool to decipher the pathogenesis of cancer and provide valuable information for cancer research. The dataset and source code of our proposed method can be found at https://github.com/Johnsunnn/CoraL.
ncRNA 编码的小肽 (ncPEPs) 最近成为癌症免疫治疗的有前途的靶点和生物标志物。因此,鉴定与癌症相关的 ncPEPs 对于癌症研究至关重要。在这项工作中,我们提出了 CoraL,这是一种用于预测与癌症相关的 ncPEPs 的新颖的监督对比元学习框架。具体来说,所提出的元学习策略使我们的模型能够从不同类型的肽中学习元知识,并在仅有少量标记样本的情况下训练出有前途的预测模型。结果表明,我们的模型能够仅使用五个样本对未见的癌症生物标志物进行高置信度预测,这可能会加速新型癌症免疫治疗生物标志物的发现。此外,我们的方法在 15 个与癌症相关的 ncPEPs 数据集上明显优于现有的深度学习模型,证明了其有效性和鲁棒性。有趣的是,当我们将模型扩展到识别来自 ncPEPs 的短开放阅读框时,模型表现出出色的性能,这表明 CoraL 在转录组水平上具有强大的预测能力。重要的是,我们的特征解释分析发现了独特的序列模式作为每个与癌症相关的 ncPEPs 的指纹,揭示了某些癌症生物标志物之间的关系,这些关系得到了相关文献和基序比较的验证。总的来说,我们期望 CoraL 能够成为破译癌症发病机制的有用工具,并为癌症研究提供有价值的信息。我们提出的方法的数据集和源代码可以在 https://github.com/Johnsunnn/CoraL 上找到。