Department of Biomedical Engineering, Yale University, New Haven, CT, 06510, USA.
Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, 22908, USA.
Sci Rep. 2019 Dec 23;9(1):19671. doi: 10.1038/s41598-019-55796-2.
Simple multilinear methods, such as partial least squares regression (PLSR), are effective at interrelating dynamic, multivariate datasets of cell-molecular biology through high-dimensional arrays. However, data collected in vivo are more difficult, because animal-to-animal variability is often high, and each time-point measured is usually a terminal endpoint for that animal. Observations are further complicated by the nesting of cells within tissues or tissue sections, which themselves are nested within animals. Here, we introduce principled resampling strategies that preserve the tissue-animal hierarchy of individual replicates and compute the uncertainty of multidimensional decompositions applied to global averages. Using molecular-phenotypic data from the mouse aorta and colon, we find that interpretation of decomposed latent variables (LVs) changes when PLSR models are resampled. Lagging LVs, which statistically improve global-average models, are unstable in resampled iterations that preserve nesting relationships, arguing that these LVs should not be mined for biological insight. Interestingly, resampling is less discriminatory for multidimensional regressions of in vitro data, where replicate-to-replicate variance is sufficiently low. Our work illustrates the challenges and opportunities in translating systems-biology approaches from cultured cells to living organisms. Nested resampling adds a straightforward quality-control step for interpreting the robustness of in vivo regression models.
简单的多元线性方法,如偏最小二乘回归(PLSR),通过高维阵列有效地将细胞分子生物学的动态、多变量数据集相互关联。然而,体内采集的数据更具挑战性,因为动物间的变异性通常较高,并且每个测量的时间点通常是该动物的终点。由于细胞在组织或组织切片内嵌套,而组织本身又嵌套在动物内,使得观察结果进一步复杂化。在这里,我们引入了原则性的重采样策略,该策略保留了个体重复的组织-动物层次结构,并计算了应用于全局平均值的多维分解的不确定性。使用来自小鼠主动脉和结肠的分子表型数据,我们发现,当对 PLSR 模型进行重采样时,分解的潜在变量(LV)的解释会发生变化。滞后 LV 在统计学上改善了全局平均模型,但在保留嵌套关系的重采样迭代中不稳定,这表明不应挖掘这些 LV 来获得生物学见解。有趣的是,对于具有足够低的重复间方差的体外数据的多维回归,重采样的区分度较低。我们的工作说明了将系统生物学方法从培养细胞转化为活体生物的挑战和机遇。嵌套重采样为解释体内回归模型的稳健性增加了一个简单的质量控制步骤。