Bhasuran Balu, Manoharan Sharanya, Iyyappan Oviya Ramalakshmi, Murugesan Gurusamy, Prabahar Archana, Raja Kalpana
School of Information, Florida State University, Tallahassee, FL 32306, USA.
Department of Bioinformatics, Stella Maris College, Chennai 600086, Tamil Nadu, India.
Biomedicines. 2024 Jul 10;12(7):1535. doi: 10.3390/biomedicines12071535.
microRNA (miRNA)-messenger RNA (mRNA or gene) interactions are pivotal in various biological processes, including the regulation of gene expression, cellular differentiation, proliferation, apoptosis, and development, as well as the maintenance of cellular homeostasis and pathogenesis of numerous diseases, such as cancer, cardiovascular diseases, neurological disorders, and metabolic conditions. Understanding the mechanisms of miRNA-mRNA interactions can provide insights into disease mechanisms and potential therapeutic targets. However, extracting these interactions efficiently from a huge collection of published articles in PubMed is challenging. In the current study, we annotated a miRNA-mRNA Interaction Corpus (MMIC) and used it for evaluating the performance of a variety of machine learning (ML) models, deep learning-based transformer (DLT) models, and large language models (LLMs) in extracting the miRNA-mRNA interactions mentioned in PubMed. We used the genomics approaches for validating the extracted miRNA-mRNA interactions. Among the ML, DLT, and LLM models, PubMedBERT showed the highest precision, recall, and F-score, with all equal to 0.783. Among the LLM models, the performance of Llama-2 is better when compared to others. Llama 2 achieved 0.56 precision, 0.86 recall, and 0.68 F-score in a zero-shot experiment and 0.56 precision, 0.87 recall, and 0.68 F-score in a three-shot experiment. Our study shows that Llama 2 achieves better recall than ML and DLT models and leaves space for further improvement in terms of precision and F-score.
微小RNA(miRNA)与信使核糖核酸(mRNA或基因)的相互作用在各种生物过程中起着关键作用,包括基因表达调控、细胞分化、增殖、凋亡和发育,以及细胞稳态的维持和众多疾病(如癌症、心血管疾病、神经疾病和代谢疾病)的发病机制。了解miRNA与mRNA相互作用的机制可以为疾病机制和潜在治疗靶点提供见解。然而,从PubMed上大量已发表文章中高效提取这些相互作用具有挑战性。在本研究中,我们注释了一个miRNA - mRNA相互作用语料库(MMIC),并将其用于评估各种机器学习(ML)模型、基于深度学习的Transformer(DLT)模型和大语言模型(LLM)在提取PubMed中提到的miRNA - mRNA相互作用方面的性能。我们使用基因组学方法来验证提取的miRNA - mRNA相互作用。在ML、DLT和LLM模型中,PubMedBERT的精确率、召回率和F值最高,均为0.783。在LLM模型中,Llama - 2的性能比其他模型更好。Llama 2在零样本实验中实现了0.56的精确率、0.86的召回率和0.68的F值,在三样本实验中实现了0.56的精确率、0.87的召回率和0.68的F值。我们的研究表明,Llama 2的召回率比ML和DLT模型更好,在精确率和F值方面还有进一步改进的空间。