Systems Immunology Department, Weizmann Institute of Science, Rehovot, Israel.
School of Mathematical Sciences, Tel Aviv University, Tel Aviv, Israel.
PLoS Comput Biol. 2022 Jul 15;18(7):e1010212. doi: 10.1371/journal.pcbi.1010212. eCollection 2022 Jul.
Longitudinal 'omics analytical methods are extensively used in the evolving field of precision medicine, by enabling 'big data' recording and high-resolution interpretation of complex datasets, driven by individual variations in response to perturbations such as disease pathogenesis, medical treatment or changes in lifestyle. However, inherent technical limitations in biomedical studies often result in the generation of feature-rich and sample-limited datasets. Analyzing such data using conventional modalities often proves to be challenging since the repeated, high-dimensional measurements overload the outlook with inconsequential variations that must be filtered from the data in order to find the true, biologically relevant signal. Tensor methods for the analysis and meaningful representation of multiway data may prove useful to the biological research community by their advertised ability to tackle this challenge. In this study, we present tcam-a new unsupervised tensor factorization method for the analysis of multiway data. Building on top of cutting-edge developments in the field of tensor-tensor algebra, we characterize the unique mathematical properties of our method, namely, 1) preservation of geometric and statistical traits of the data, which enable uncovering information beyond the inter-individual variation that often takes over the focus, especially in human studies. 2) Natural and straightforward out-of-sample extension, making tcam amenable for integration in machine learning workflows. A series of re-analyses of real-world, human experimental datasets showcase these theoretical properties, while providing empirical confirmation of tcam's utility in the analysis of longitudinal 'omics data.
纵向 'omics 分析方法在不断发展的精准医学领域得到了广泛应用,通过记录 '大数据' 和对个体对疾病发病机制、医学治疗或生活方式改变等干扰的反应的复杂数据集进行高分辨率解释,实现了这一目标。然而,生物医学研究中的固有技术限制通常导致生成富含特征但样本有限的数据集。使用传统模式分析此类数据通常具有挑战性,因为重复的高维测量会使结果过载,出现无关的变化,必须从数据中过滤这些变化,才能找到真正的、具有生物学相关性的信号。张量方法用于分析和表示多向数据,通过其宣称的能力来应对这一挑战,可能对生物研究界有用。在这项研究中,我们提出了 tcam-一种用于分析多向数据的新的无监督张量分解方法。基于张量张量代数领域的最新发展,我们描述了我们方法的独特数学特性,即 1)保留数据的几何和统计特征,这使得能够揭示超越个体变异的信息,个体变异通常占据了焦点,特别是在人类研究中。2)自然而直接的样本外扩展,使 tcam 适用于机器学习工作流程的集成。对真实的、人类实验数据集的一系列重新分析展示了这些理论特性,同时证实了 tcam 在分析纵向 'omics 数据方面的实用性。