School of Information Science and Technology, ShanghaiTech University, 393 Middle Huaxia Road, Pudong District, 201210, Shanghai, China.
BMC Bioinformatics. 2021 Sep 3;22(Suppl 10):419. doi: 10.1186/s12859-021-04330-1.
RNA velocity is a novel and powerful concept which enables the inference of dynamical cell state changes from seemingly static single-cell RNA sequencing (scRNA-seq) data. However, accurate estimation of RNA velocity is still a challenging problem, and the underlying kinetic mechanisms of transcriptional and splicing regulations are not fully clear. Moreover, scRNA-seq data tend to be sparse compared with possible cell states, and a given dataset of estimated RNA velocities needs imputation for some cell states not yet covered.
We formulate RNA velocity prediction as a supervised learning problem of classification for the first time, where a cell state space is divided into equal-sized segments by directions as classes, and the estimated RNA velocity vectors are considered as ground truth. We propose Velo-Predictor, an ensemble learning pipeline for predicting RNA velocities from scRNA-seq data. We test different models on two real datasets, Velo-Predictor exhibits good performance, especially when XGBoost was used as the base predictor. Parameter analysis and visualization also show that the method is robust and able to make biologically meaningful predictions.
The accurate result shows that Velo-Predictor can effectively simplify the procedure by learning a predictive model from gene expression data, which could help to construct a continous landscape and give biologists an intuitive picture about the trend of cellular dynamics.
RNA 速度是一个新颖而强大的概念,它能够从看似静态的单细胞 RNA 测序 (scRNA-seq) 数据中推断出动态的细胞状态变化。然而,准确估计 RNA 速度仍然是一个具有挑战性的问题,转录和剪接调控的潜在动力学机制尚不完全清楚。此外,与可能的细胞状态相比,scRNA-seq 数据往往较为稀疏,并且需要对一些尚未涵盖的细胞状态进行估计 RNA 速度的数据集进行插补。
我们首次将 RNA 速度预测表述为分类的监督学习问题,其中细胞状态空间通过方向划分为等大小的段作为类,并且估计的 RNA 速度向量被视为真实值。我们提出了 Velo-Predictor,这是一种用于从 scRNA-seq 数据预测 RNA 速度的集成学习管道。我们在两个真实数据集上测试了不同的模型,Velo-Predictor 表现出良好的性能,特别是当 XGBoost 用作基础预测器时。参数分析和可视化也表明该方法具有鲁棒性并且能够进行有生物学意义的预测。
准确的结果表明,Velo-Predictor 可以通过从基因表达数据中学习预测模型来有效地简化该过程,这有助于构建连续的景观,并为生物学家提供关于细胞动力学趋势的直观图片。