Zhang Shao-Wu, Fan Xiao-Nan
Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China.
Med Chem. 2017;13(6):515-525. doi: 10.2174/1573406413666170510102405.
RNA-protein interactions (RPIs) play an important role in many cellular processes. In particular, noncoding RNA-protein interactions (ncRPIs) are involved in various gene regulations and human complex diseases. High-throughput experiments have provided a large number of valuable information about ncRPIs, but these experiments are expensive and timeconsuming. Therefore, some computational approaches have been developed to predict ncRPIs efficiently and effectively.
In this work, we will describe the recent advance of predicting ncRPIs from the following aspects: i) the dataset construction; ii) the sequence and structural feature representation, and iii) the machine learning algorithm.
The current methods have successfully predicted ncRPIs, but most of them trained and tested on the small benchmark datasets derived from ncRNA-protein complexes in PDB database. The generalization performance and robust of these existing methods need to be further improved.
Concomitant with the large numbers of ncRPIs generated by high-throughput technologies, three future directions for predicting ncRPIs with machine learning should be paid attention. One direction is that how to effectively construct the negative sample set. Another is the selection of novel and effective features from the sequences and structures of ncRNAs and proteins. The third is the design of powerful predictor.
RNA-蛋白质相互作用(RPI)在许多细胞过程中发挥着重要作用。特别是,非编码RNA-蛋白质相互作用(ncRPI)参与各种基因调控和人类复杂疾病。高通量实验提供了大量有关ncRPI的有价值信息,但这些实验成本高昂且耗时。因此,已开发出一些计算方法来高效且有效地预测ncRPI。
在这项工作中,我们将从以下几个方面描述预测ncRPI的最新进展:i)数据集构建;ii)序列和结构特征表示,以及iii)机器学习算法。
当前方法已成功预测ncRPI,但大多数方法是在从PDB数据库中的ncRNA-蛋白质复合物衍生的小型基准数据集上进行训练和测试的。这些现有方法的泛化性能和稳健性需要进一步提高。
随着高通量技术产生大量的ncRPI,应关注利用机器学习预测ncRPI的三个未来方向。一个方向是如何有效地构建负样本集。另一个是从ncRNA和蛋白质的序列和结构中选择新颖且有效的特征。第三个是设计强大的预测器。