Faculty of Pure and Applied Mathematics, Hugo Steinhaus Center, Wrocław University of Science and Technology, 50-370 Wrocław, Poland.
Phys Rev E. 2020 Sep;102(3-1):032402. doi: 10.1103/PhysRevE.102.032402.
Single-particle tracking (SPT) has become a popular tool to study the intracellular transport of molecules in living cells. Inferring the character of their dynamics is important, because it determines the organization and functions of the cells. For this reason, one of the first steps in the analysis of SPT data is the identification of the diffusion type of the observed particles. The most popular method to identify the class of a trajectory is based on the mean-square displacement (MSD). However, due to its known limitations, several other approaches have been already proposed. With the recent advances in algorithms and the developments of modern hardware, the classification attempts rooted in machine learning (ML) are of particular interest. In this work, we adopt two ML ensemble algorithms, i.e., random forest and gradient boosting, to the problem of trajectory classification. We present a new set of features used to transform the raw trajectories data into input vectors required by the classifiers. The resulting models are then applied to real data for G protein-coupled receptors and G proteins. The classification results are compared to recent statistical methods going beyond MSD.
单粒子追踪 (SPT) 已成为研究活细胞内分子内运输的一种流行工具。推断它们动力学的特征很重要,因为它决定了细胞的组织和功能。出于这个原因,SPT 数据分析的第一步之一是识别观察到的粒子的扩散类型。最流行的识别轨迹类别的方法是基于均方根位移 (MSD)。然而,由于其已知的局限性,已经提出了几种其他方法。随着算法的最新进展和现代硬件的发展,基于机器学习 (ML) 的分类尝试特别有趣。在这项工作中,我们将两种 ML 集成算法(即随机森林和梯度提升)应用于轨迹分类问题。我们提出了一组新的特征,用于将原始轨迹数据转换为分类器所需的输入向量。然后,将得到的模型应用于 G 蛋白偶联受体和 G 蛋白的实际数据。将分类结果与超越 MSD 的最新统计方法进行比较。