Pan Xiaoyong, Fan Yong-Xian, Yan Junchi, Shen Hong-Bin
Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Dongchuan Road, Shanghai, China.
Present Address: Department of Veterinary Clinical and Animal Sciences, University of Copenhagen, Copenhagen, Denmark.
BMC Genomics. 2016 Aug 9;17:582. doi: 10.1186/s12864-016-2931-8.
Non-coding RNAs (ncRNAs) play crucial roles in many biological processes, such as post-transcription of gene regulation. ncRNAs mainly function through interaction with RNA binding proteins (RBPs). To understand the function of a ncRNA, a fundamental step is to identify which protein is involved into its interaction. Therefore it is promising to computationally predict RBPs, where the major challenge is that the interaction pattern or motif is difficult to be found.
In this study, we propose a computational method IPMiner (Interaction Pattern Miner) to predict ncRNA-protein interactions from sequences, which makes use of deep learning and further improves its performance using stacked ensembling. One of the IPMiner's typical merits is that it is able to mine the hidden sequential interaction patterns from sequence composition features of protein and RNA sequences using stacked autoencoder, and then the learned hidden features are fed into random forest models. Finally, stacked ensembling is used to integrate different predictors to further improve the prediction performance. The experimental results indicate that IPMiner achieves superior performance on the tested lncRNA-protein interaction dataset with an accuracy of 0.891, sensitivity of 0.939, specificity of 0.831, precision of 0.945 and Matthews correlation coefficient of 0.784, respectively. We further comprehensively investigate IPMiner on other RNA-protein interaction datasets, which yields better performance than the state-of-the-art methods, and the performance has an increase of over 20 % on some tested benchmarked datasets. In addition, we further apply IPMiner for large-scale prediction of ncRNA-protein network, that achieves promising prediction performance.
By integrating deep neural network and stacked ensembling, from simple sequence composition features, IPMiner can automatically learn high-level abstraction features, which had strong discriminant ability for RNA-protein detection. IPMiner achieved high performance on our constructed lncRNA-protein benchmark dataset and other RNA-protein datasets. IPMiner tool is available at http://www.csbio.sjtu.edu.cn/bioinf/IPMiner .
非编码RNA(ncRNAs)在许多生物学过程中发挥着关键作用,如基因调控的转录后过程。ncRNAs主要通过与RNA结合蛋白(RBPs)相互作用发挥功能。为了解ncRNA的功能,一个基本步骤是确定参与其相互作用的蛋白质。因此,通过计算预测RBPs具有很大的前景,其中主要挑战在于难以找到相互作用模式或基序。
在本研究中,我们提出了一种计算方法IPMiner(相互作用模式挖掘器),用于从序列中预测ncRNA-蛋白质相互作用,该方法利用深度学习,并通过堆叠集成进一步提高其性能。IPMiner的一个典型优点是,它能够使用堆叠自动编码器从蛋白质和RNA序列的序列组成特征中挖掘隐藏的序列相互作用模式,然后将学习到的隐藏特征输入到随机森林模型中。最后,使用堆叠集成来整合不同的预测器,以进一步提高预测性能。实验结果表明,IPMiner在测试的lncRNA-蛋白质相互作用数据集上表现优异,准确率为0.891,灵敏度为0.939,特异性为0.831,精确率为0.945,马修斯相关系数为0.784。我们进一步在其他RNA-蛋白质相互作用数据集上全面研究了IPMiner,其性能优于现有方法,并且在一些测试的基准数据集上性能提高了20%以上。此外,我们进一步将IPMiner应用于ncRNA-蛋白质网络的大规模预测,取得了有前景的预测性能。
通过整合深度神经网络和堆叠集成,IPMiner可以从简单的序列组成特征中自动学习具有强判别能力的高级抽象特征,用于RNA-蛋白质检测。IPMiner在我们构建的lncRNA-蛋白质基准数据集和其他RNA-蛋白质数据集上取得了高性能。IPMiner工具可在http://www.csbio.sjtu.edu.cn/bioinf/IPMiner获取。