Han Shuangkai, Liu Lin
School of Information, Yunnan Normal University, Kunming, China.
Engineering Research Center of Computer Vision and Intelligent Control Technology, Department of Education of Yunnan Province, China.
Comput Struct Biotechnol J. 2024 May 3;23:2034-2048. doi: 10.1016/j.csbj.2024.04.052. eCollection 2024 Dec.
Numerous research results demonstrated that understanding the subcellular localization of non-coding RNAs (ncRNAs) is pivotal in elucidating their roles and regulatory mechanisms in cells. Despite the existence of over ten computational models dedicated to predicting the subcellular localization of ncRNAs, a majority of these models are designed solely for single-label prediction. In reality, ncRNAs often exhibit localization across multiple subcellular compartments. Furthermore, the existing multi-label localization prediction models are insufficient in addressing the challenges posed by the scarcity of training samples and class imbalance in ncRNA dataset. To address these limitations, this study proposes a novel multi-label localization prediction model for ncRNAs, named GP-HTNLoc. To mitigate class imbalance, GP-HTNLoc adopts separate training approaches for head and tail location labels. Additionally, GP-HTNLoc introduces a pioneering graph prototype module to enhance its performance in small-sample, multi-label scenarios. The experimental results based on 10-fold cross-validation on benchmark datasets demonstrate that GP-HTNLoc achieves competitive predictive performance. The average results from 10 rounds of testing on an independent dataset show that GP-HTNLoc outperforms the best existing models on the human lncRNA, human snoRNA, and human miRNA subsets, with average precision improvements of 31.5%, 14.2%, and 5.6%, respectively, reaching 0.685, 0.632, and 0.704. A user-friendly online GP-HTNLoc server is accessible at https://56s8y85390.goho.co.
众多研究结果表明,了解非编码RNA(ncRNA)的亚细胞定位对于阐明其在细胞中的作用和调控机制至关重要。尽管存在十多种专门用于预测ncRNA亚细胞定位的计算模型,但这些模型大多仅设计用于单标签预测。实际上,ncRNA通常在多个亚细胞区室中表现出定位。此外,现有的多标签定位预测模型在应对ncRNA数据集中训练样本稀缺和类别不平衡所带来的挑战方面存在不足。为了解决这些限制,本研究提出了一种用于ncRNA的新型多标签定位预测模型,名为GP-HTNLoc。为了缓解类别不平衡,GP-HTNLoc对头尾位置标签采用单独的训练方法。此外,GP-HTNLoc引入了一个开创性的图原型模块,以提高其在小样本、多标签场景下的性能。基于基准数据集的10折交叉验证的实验结果表明,GP-HTNLoc具有有竞争力的预测性能。在独立数据集上进行的10轮测试的平均结果表明,GP-HTNLoc在人类lncRNA、人类snoRNA和人类miRNA子集上优于现有的最佳模型,平均精度分别提高了31.5%、14.2%和5.6%,达到0.685、0.632和0.704。可通过https://56s8y85390.goho.co访问用户友好的在线GP-HTNLoc服务器。