School of Science, Jiangnan University, Wuxi 214122, China.
Biomolecules. 2024 Nov 1;14(11):1396. doi: 10.3390/biom14111396.
In the past decade, inferring developmental trajectories from single-cell data has become a significant challenge in bioinformatics. RNA velocity, with its incorporation of directional dynamics, has significantly advanced the study of single-cell trajectories. However, as single-cell RNA sequencing technology evolves, it generates complex, high-dimensional data with high noise levels. Existing trajectory inference methods, which overlook cell distribution characteristics, may perform inadequately under such conditions. To address this, we introduce CPvGTI, a Gaussian distribution-based trajectory inference method. CPvGTI utilizes a Gaussian mixture model, optimized by the Expectation-Maximization algorithm, to construct new cell populations in the original data space. By integrating RNA velocity, CPvGTI employs Gaussian Process Regression to analyze the differentiation trajectories of these cell populations. To evaluate the performance of CPvGTI, we assess CPvGTI's performance against several state-of-the-art methods using four structurally diverse simulated datasets and four real datasets. The simulation studies indicate that CPvGTI excels in pseudo-time prediction and structural reconstruction compared to existing methods. Furthermore, the discovery of new branch trajectories in human forebrain and mouse hematopoiesis datasets confirms CPvGTI's superior performance.
在过去的十年中,从单细胞数据推断发育轨迹已成为生物信息学中的一个重大挑战。RNA 速度分析结合了有向动力学,极大地推动了单细胞轨迹的研究。然而,随着单细胞 RNA 测序技术的发展,它产生了复杂的、高维的、具有高噪声水平的数据。现有的轨迹推断方法忽略了细胞分布特征,在这种情况下可能表现不佳。为了解决这个问题,我们引入了基于高斯分布的轨迹推断方法 CPvGTI。CPvGTI 使用高斯混合模型,通过期望最大化算法进行优化,在原始数据空间中构建新的细胞群体。通过整合 RNA 速度,CPvGTI 使用高斯过程回归来分析这些细胞群体的分化轨迹。为了评估 CPvGTI 的性能,我们使用四个结构不同的模拟数据集和四个真实数据集,将 CPvGTI 的性能与几种最先进的方法进行了比较。模拟研究表明,CPvGTI 在伪时间预测和结构重建方面优于现有方法。此外,在人类前脑和小鼠造血数据集发现新的分支轨迹,进一步证实了 CPvGTI 的优越性能。