Zhan Zhao-Hui, You Zhu-Hong, Li Li-Ping, Zhou Yong, Yi Hai-Cheng
School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China.
Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China.
Front Genet. 2018 Oct 8;9:458. doi: 10.3389/fgene.2018.00458. eCollection 2018.
Non-coding RNA (ncRNA) plays a crucial role in numerous biological processes including gene expression and post-transcriptional gene regulation. The biological function of ncRNA is mostly realized by binding with related proteins. Therefore, an accurate understanding of interactions between ncRNA and protein has a significant impact on current biological research. The major challenge at this stage is the waste of a great deal of redundant time and resource consumed on classification in traditional interaction pattern prediction methods. Fortunately, an efficient classifier named LightGBM can solve this difficulty of long time consumption. In this study, we employed LightGBM as the integrated classifier and proposed a novel computational model for predicting ncRNA and protein interactions. More specifically, the pseudo-Zernike Moments and singular value decomposition algorithm are employed to extract the discriminative features from protein and ncRNA sequences. On four widely used datasets RPI369, RPI488, RPI1807, and RPI2241, we evaluated the performance of LGBM and obtained an superior performance with AUC of 0.799, 0.914, 0.989, and 0.762, respectively. The experimental results of 10-fold cross-validation shown that the proposed method performs much better than existing methods in predicting ncRNA-protein interaction patterns, which could be used as a useful tool in proteomics research.
非编码RNA(ncRNA)在包括基因表达和转录后基因调控在内的众多生物过程中发挥着关键作用。ncRNA的生物学功能大多通过与相关蛋白质结合来实现。因此,准确理解ncRNA与蛋白质之间的相互作用对当前的生物学研究具有重大影响。现阶段的主要挑战在于传统相互作用模式预测方法在分类过程中浪费了大量冗余的时间和资源。幸运的是,一种名为LightGBM的高效分类器可以解决这个耗时的难题。在本研究中,我们将LightGBM用作集成分类器,并提出了一种预测ncRNA与蛋白质相互作用的新型计算模型。更具体地说,采用伪泽尼克矩和奇异值分解算法从蛋白质和ncRNA序列中提取判别特征。在四个广泛使用的数据集RPI369、RPI488、RPI1807和RPI2241上,我们评估了LGBM的性能,分别获得了0.799、0.914、0.989和0.762的AUC优异性能。十折交叉验证的实验结果表明,所提出的方法在预测ncRNA - 蛋白质相互作用模式方面比现有方法表现得好得多,可作为蛋白质组学研究中的有用工具。