Suppr超能文献

基于多核学习的图正则化k-局部超平面距离最近邻模型识别非编码RNA亚细胞定位

Identify ncRNA Subcellular Localization via Graph Regularized k-Local Hyperplane Distance Nearest Neighbor Model on Multi-Kernel Learning.

作者信息

Zhou Haohao, Wang Hao, Tang Jijun, Ding Yijie, Guo Fei

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2022 Nov-Dec;19(6):3517-3529. doi: 10.1109/TCBB.2021.3107621. Epub 2022 Dec 8.

Abstract

Non-coding RNAs (ncRNAs) are a type of RNAs which are not used to encode protein sequences. Emerging evidence shows that lots of ncRNAs may participate in many biological processes and must be widely involved in many types of cancers. Therefore, understanding their functionality is of great importance. Similar to proteins, various functions of ncRNAs relies on their subcellular localizations. Traditional high-throughput methods in wet-lab to identify subcellular localization is time-consuming and costly. In this paper, we propose a novel computational method based on multi-kernel learning to identify multi-label ncRNA subcellular localizations, via graph regularized k-local hyperplane distance nearest neighbor algorithm. First, we construct six types of sequence-based feature descriptors and select important feature vectors. Then, we build a multi-kernel learning model with Hilbert-Schmidt independence criterion (HSIC) to obtain optimal weights for vairous features. Furthermore, we propose the graph regularized k-local hyperplane distance nearest neighbor algorithm (GHKNN) as a binary classification model for detecting one kind of non-coding RNA subcellular localization. Finally, we apply One-vs-Rest strategy to decompose multi-label problem of non-coding RNA subcellular localizations. Our method achieves excellent performance on three ncRNA datasets and three human ncRNA datasets, and out-performs other outstanding machine learning methods. Comparing to existing method, our model also performs well especially on small datasets. We expect that this model will be useful for the prediction of subcellular localization and the study of important functional mechanisms of ncRNAs. Furthermore, we establish user-friendly web server (http://ncrna.lbci.net/) with the implementation of our method, which can be easily used by most experimental scientists.

摘要

非编码RNA(ncRNAs)是一类不用于编码蛋白质序列的RNA。新出现的证据表明,许多ncRNAs可能参与多种生物学过程,并且必然广泛涉及多种类型的癌症。因此,了解它们的功能非常重要。与蛋白质类似,ncRNAs的各种功能依赖于它们的亚细胞定位。传统的湿实验室高通量方法来识别亚细胞定位既耗时又昂贵。在本文中,我们提出了一种基于多核学习的新型计算方法,通过图正则化k局部超平面距离最近邻算法来识别多标签ncRNA亚细胞定位。首先,我们构建六种基于序列的特征描述符并选择重要特征向量。然后,我们构建一个带有希尔伯特-施密特独立性准则(HSIC)的多核学习模型,以获得各种特征的最优权重。此外,我们提出图正则化k局部超平面距离最近邻算法(GHKNN)作为检测一种非编码RNA亚细胞定位的二元分类模型。最后,我们应用一对多策略来分解非编码RNA亚细胞定位的多标签问题。我们的方法在三个ncRNA数据集和三个人类ncRNA数据集上取得了优异的性能,并且优于其他优秀的机器学习方法。与现有方法相比,我们的模型在小数据集上尤其表现良好。我们期望这个模型将有助于预测亚细胞定位以及研究ncRNAs的重要功能机制。此外,我们通过实现我们的方法建立了用户友好的网络服务器(http://ncrna.lbci.net/),大多数实验科学家都可以轻松使用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验