Arora Akanksha, Raghava Gajendra Pal Singh
Department of Computational Biology, Indraprastha Institute of Information Technology, Delhi, Okhla Phase 3, A-302 (R&D Block), New Delhi, 110020, India.
Sci Rep. 2025 Aug 25;15(1):31191. doi: 10.1038/s41598-025-15814-y.
In this study, we investigated the properties of exosomal miRNAs to identify potential biomarkers for liquid biopsy. We collected 956 exosomal and 956 non-exosomal miRNA sequences from RNALocate and miRBase to develop predictive models. Our initial analysis reveals that specific nucleotides are preferred at certain positions in miRNAs associated with exosomes. We employed an alignment-based approach, artificial intelligence (AI) models, and ensemble methods for predicting exosomal miRNAs. For the alignment-based approach, we used a motif-based method with MERCI and a similarity-based method with BLAST, achieving high precision but low coverage of about 29%. The AI models, developed using machine learning, deep learning techniques, and pretrained language models, achieved a maximum AUC of 0.707 and an MCC of 0.268 on an independent dataset. Finally, our ensemble method, combining alignment-based and AI-based models, reached a maximum AUC of 0.73 and an MCC of 0.352 on an independent dataset. We have developed a web server, EmiRPred, to assist the scientific community in predicting and designing exosomal miRNAs and identifying associated motifs ( https://webs.iiitd.edu.in/raghava/emirpred/ ).
在本研究中,我们调查了外泌体微小RNA(miRNA)的特性,以识别液体活检的潜在生物标志物。我们从RNALocate和miRBase收集了956个外泌体miRNA序列和956个非外泌体miRNA序列,用于开发预测模型。我们的初步分析表明,与外泌体相关的miRNA在特定位置偏好特定核苷酸。我们采用了基于比对的方法、人工智能(AI)模型和集成方法来预测外泌体miRNA。对于基于比对的方法,我们使用了基于基序的MERCI方法和基于相似性的BLAST方法,精度较高,但覆盖率较低,约为29%。使用机器学习、深度学习技术和预训练语言模型开发的AI模型在独立数据集上的最大曲线下面积(AUC)为0.707,马修斯相关系数(MCC)为0.268。最后,我们将基于比对的模型和基于AI的模型相结合的集成方法在独立数据集上的最大AUC为0.73,MCC为0.352。我们开发了一个网络服务器EmiRPred,以协助科学界预测和设计外泌体miRNA并识别相关基序(https://webs.iiitd.edu.in/raghava/emirpred/)。