Suppr超能文献

MSLP:基于机器学习技术的 mRNA 亚细胞定位预测器。

MSLP: mRNA subcellular localization predictor based on machine learning techniques.

机构信息

College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar.

Computer Science Department, Southern Connecticut State University, New Haven, CT, USA.

出版信息

BMC Bioinformatics. 2023 Mar 22;24(1):109. doi: 10.1186/s12859-023-05232-0.

Abstract

BACKGROUND

Subcellular localization of messenger RNA (mRNAs) plays a pivotal role in the regulation of gene expression, cell migration as well as in cellular adaptation. Experiment techniques for pinpointing the subcellular localization of mRNAs are laborious, time-consuming and expensive. Therefore, in silico approaches for this purpose are attaining great attention in the RNA community.

METHODS

In this article, we propose MSLP, a machine learning-based method to predict the subcellular localization of mRNA. We propose a novel combination of four types of features representing k-mer, pseudo k-tuple nucleotide composition (PseKNC), physicochemical properties of nucleotides, and 3D representation of sequences based on Z-curve transformation to feed into machine learning algorithm to predict the subcellular localization of mRNAs.

RESULTS

Considering the combination of the above-mentioned features, ennsemble-based models achieved state-of-the-art results in mRNA subcellular localization prediction tasks for multiple benchmark datasets. We evaluated the performance of our method  in ten subcellular locations, covering cytoplasm, nucleus, endoplasmic reticulum (ER), extracellular region (ExR), mitochondria, cytosol, pseudopodium, posterior, exosome, and the ribosome. Ablation study highlighted k-mer and PseKNC to be more dominant than other features for predicting cytoplasm, nucleus, and ER localizations. On the other hand, physicochemical properties and Z-curve based features contributed the most to ExR and mitochondria detection. SHAP-based analysis revealed the relative importance of features to provide better insights into the proposed approach.

AVAILABILITY

We have implemented a Docker container and API for end users to run their sequences on our model. Datasets, the code of API and the Docker are shared for the community in GitHub at: https://github.com/smusleh/MSLP .

摘要

背景

信使 RNA(mRNA)的亚细胞定位在基因表达调控、细胞迁移以及细胞适应中起着关键作用。精确定位 mRNA 亚细胞定位的实验技术既繁琐又耗时耗力,成本高昂。因此,针对这一目的的计算方法在 RNA 领域受到了极大的关注。

方法

在本文中,我们提出了 MSLP,一种基于机器学习的方法,用于预测 mRNA 的亚细胞定位。我们提出了一种新的组合,包括四种类型的特征,代表 k-mer、伪 k- 核苷酸组合(PseKNC)、核苷酸的理化性质和基于 Z 曲线变换的序列 3D 表示,将其输入机器学习算法以预测 mRNA 的亚细胞定位。

结果

考虑到上述特征的组合,基于集成的模型在多个基准数据集的 mRNA 亚细胞定位预测任务中取得了最先进的结果。我们在十个亚细胞位置评估了我们方法的性能,包括细胞质、细胞核、内质网(ER)、细胞外区(ExR)、线粒体、胞质溶胶、伪足、后区、外泌体和核糖体。消融研究强调 k-mer 和 PseKNC 比其他特征更适合预测细胞质、细胞核和 ER 定位。另一方面,理化性质和基于 Z 曲线的特征对 ExR 和线粒体检测的贡献最大。基于 SHAP 的分析揭示了特征的相对重要性,为提出的方法提供了更好的见解。

可用性

我们已经为最终用户实现了一个 Docker 容器和 API,以便在我们的模型上运行他们的序列。数据集、API 的代码和 Docker 已在 GitHub 上共享给社区:https://github.com/smusleh/MSLP。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e155/10035125/956d1e4a0590/12859_2023_5232_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验