Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China.
Department of Medical Informatics, Erasmus MC, Rotterdam, The Netherlands.
Bioinformatics. 2018 Jul 1;34(13):2185-2194. doi: 10.1093/bioinformatics/bty085.
The long non-coding RNA (lncRNA) studies have been hot topics in the field of RNA biology. Recent studies have shown that their subcellular localizations carry important information for understanding their complex biological functions. Considering the costly and time-consuming experiments for identifying subcellular localization of lncRNAs, computational methods are urgently desired. However, to the best of our knowledge, there are no computational tools for predicting the lncRNA subcellular locations to date.
In this study, we report an ensemble classifier-based predictor, lncLocator, for predicting the lncRNA subcellular localizations. To fully exploit lncRNA sequence information, we adopt both k-mer features and high-level abstraction features generated by unsupervised deep models, and construct four classifiers by feeding these two types of features to support vector machine (SVM) and random forest (RF), respectively. Then we use a stacked ensemble strategy to combine the four classifiers and get the final prediction results. The current lncLocator can predict five subcellular localizations of lncRNAs, including cytoplasm, nucleus, cytosol, ribosome and exosome, and yield an overall accuracy of 0.59 on the constructed benchmark dataset.
The lncLocator is available at www.csbio.sjtu.edu.cn/bioinf/lncLocator.
Supplementary data are available at Bioinformatics online.
长链非编码 RNA(lncRNA)的研究一直是 RNA 生物学领域的热门话题。最近的研究表明,它们的亚细胞定位携带了理解其复杂生物学功能的重要信息。考虑到鉴定 lncRNA 亚细胞定位的昂贵和耗时的实验,迫切需要计算方法。然而,据我们所知,目前还没有用于预测 lncRNA 亚细胞位置的计算工具。
在这项研究中,我们报告了一种基于集成分类器的预测器 lncLocator,用于预测 lncRNA 的亚细胞定位。为了充分利用 lncRNA 序列信息,我们采用了 k-mer 特征和无监督深度模型生成的高级抽象特征,并通过分别向支持向量机(SVM)和随机森林(RF)馈送这两种类型的特征,构建了四个分类器。然后,我们使用堆叠集成策略来组合这四个分类器并获得最终的预测结果。目前的 lncLocator 可以预测 lncRNA 的五个亚细胞定位,包括细胞质、细胞核、胞浆、核糖体和外泌体,并在构建的基准数据集上获得 0.59 的总体准确率。
lncLocator 可在 www.csbio.sjtu.edu.cn/bioinf/lncLocator 获得。
补充数据可在生物信息学在线获得。