Eshleman Ryan, Singh Rahul
IEEE Trans Nanobioscience. 2017 Mar;16(2):140-147. doi: 10.1109/TNB.2017.2667402. Epub 2017 Feb 9.
Identifying the temporal progression of a set of biological samples is crucial for comprehending the dynamics of the underlying molecular interactions. It is often also a basic step in data denoising and synchronization. Finally, identifying the progression order is crucial for problems like cell lineage identification, disease progression, tumor classification, and epidemiology and thus impacts the spectrum of disciplines spanning basic biology, drug discovery, and public health. Current methods that attempt solving this problem, face difficulty when it is necessary to factor-in complex relationships within the data, such as grouping, partial ordering or bifurcating or multifurcating progressions. We propose the notion of cluster spanning trees (CST) that can model both linear as well as the aforementioned complex progression relationships in temporally evolving data. Through a number of experimental investigations involving synthetic data sets as well as data sets from the cell cycle, cellular differentiation, phenotypic screening, and genetic variation, we show that the proposed CST approach outperforms existing methods in reconstructing the temporal progression of the data.
识别一组生物样本的时间进程对于理解潜在分子相互作用的动态变化至关重要。它通常也是数据去噪和同步的基本步骤。最后,识别进程顺序对于细胞谱系识别、疾病进展、肿瘤分类和流行病学等问题至关重要,从而影响基础生物学、药物发现和公共卫生等一系列学科。当前试图解决这个问题的方法,在需要考虑数据中的复杂关系时会面临困难,例如分组、部分排序或分叉或多分叉进程。我们提出了簇生成树(CST)的概念,它可以对时间演化数据中的线性以及上述复杂进程关系进行建模。通过一系列涉及合成数据集以及来自细胞周期、细胞分化、表型筛选和基因变异的数据集的实验研究,我们表明所提出的CST方法在重建数据的时间进程方面优于现有方法。