Suppr超能文献

SubLocEP:一种基于机器学习的真核 mRNA 亚细胞定位的新型集成预测器。

SubLocEP: a novel ensemble predictor of subcellular localization of eukaryotic mRNA based on machine learning.

机构信息

Tianjin University.

School of Intelligent Manufacturing and Equipment, Shenzhen Institute of Information Technology.

出版信息

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbaa401.

Abstract

MOTIVATION

mRNA location corresponds to the location of protein translation and contributes to precise spatial and temporal management of the protein function. However, current assignment of subcellular localization of eukaryotic mRNA reveals important limitations: (1) turning multiple classifications into multiple dichotomies makes the training process tedious; (2) the majority of the models trained by classical algorithm are based on the extraction of single sequence information; (3) the existing state-of-the-art models have not reached an ideal level in terms of prediction and generalization ability. To achieve better assignment of subcellular localization of eukaryotic mRNA, a better and more comprehensive model must be developed.

RESULTS

In this paper, SubLocEP is proposed as a two-layer integrated prediction model for accurate prediction of the location of sequence samples. Unlike the existing models based on limited features, SubLocEP comprehensively considers additional feature attributes and is combined with LightGBM to generated single feature classifiers. The initial integration model (single-layer model) is generated according to the categories of a feature. Subsequently, two single-layer integration models are weighted (sequence-based: physicochemical properties = 3:2) to produce the final two-layer model. The performance of SubLocEP on independent datasets is sufficient to indicate that SubLocEP is an accurate and stable prediction model with strong generalization ability. Additionally, an online tool has been developed that contains experimental data and can maximize the user convenience for estimation of subcellular localization of eukaryotic mRNA.

摘要

动机

mRNA 的位置与蛋白质翻译的位置相对应,有助于蛋白质功能的精确时空管理。然而,目前真核 mRNA 的亚细胞定位分配显示出重要的局限性:(1)将多种分类转变为多种二分法使训练过程变得繁琐;(2)经典算法训练的大多数模型都是基于单序列信息的提取;(3)现有的最先进模型在预测和泛化能力方面还没有达到理想水平。为了更好地分配真核 mRNA 的亚细胞定位,必须开发更好、更全面的模型。

结果

在本文中,提出了 SubLocEP 作为一种两层集成预测模型,用于准确预测序列样本的位置。与现有的基于有限特征的模型不同,SubLocEP 全面考虑了其他特征属性,并与 LightGBM 相结合,生成了单特征分类器。根据特征的类别生成初始集成模型(单层模型)。随后,对两个单层集成模型进行加权(基于序列:理化性质=3:2),以生成最终的两层模型。SubLocEP 在独立数据集上的性能足以表明它是一种具有强大泛化能力的准确且稳定的预测模型。此外,还开发了一个在线工具,其中包含实验数据,可最大限度地提高用户估计真核 mRNA 亚细胞定位的便利性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验