Department of Oral Health Sciences, Medical University of South Carolina, Charleston, SC 29425, USA.
Department of Radiation Oncology, University of Pittsburgh, Pittsburgh, PA 15260, USA.
Genes (Basel). 2024 May 3;15(5):582. doi: 10.3390/genes15050582.
High-dimensional biomedical datasets have become easier to collect in the last two decades with the advent of multi-omic and single-cell experiments. These can generate over 1000 measurements per sample or per cell. More recently, focus has been drawn toward the need for longitudinal datasets, with the appreciation that important dynamic changes occur along transitions between health and disease. Analysis of longitudinal omics data comes with many challenges, including type I error inflation and corresponding loss in power when thousands of hypothesis tests are needed. Multivariate analysis can yield approaches with higher statistical power; however, multivariate methods for longitudinal data are currently limited. We propose a multivariate distance-based drift-diffusion framework (MD3F) to tackle the need for a multivariate approach to longitudinal, high-throughput datasets. We show that MD3F can result in surprisingly simple yet valid and powerful hypothesis testing and estimation approaches using generalized linear models. Through simulation and application studies, we show that MD3F is robust and can offer a broadly applicable method for assessing multivariate dynamics in omics data.
在过去的二十年中,随着多组学和单细胞实验的出现,高维生物医学数据集变得更容易收集。这些可以为每个样本或每个细胞生成超过 1000 个测量值。最近,人们开始关注纵向数据集的需求,因为人们意识到在健康和疾病之间的转变过程中会发生重要的动态变化。纵向组学数据分析面临许多挑战,包括当需要进行数千个假设检验时,I 型错误膨胀和相应的效力损失。多变量分析可以产生具有更高统计效力的方法;然而,目前纵向数据的多变量方法受到限制。我们提出了一种基于距离的多维漂移-扩散框架(MD3F)来满足对纵向高通量数据集的多变量方法的需求。我们表明,MD3F 可以使用广义线性模型产生非常简单但有效且强大的假设检验和估计方法。通过模拟和应用研究,我们表明 MD3F 具有稳健性,并可以为评估组学数据中的多元动态提供一种广泛适用的方法。