Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA 98109, United States.
SWOG Statistics and Data Management Center, Fred Hutchinson Cancer Center, Seattle, WA 98109, United States.
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae522.
Multiplexed spatial proteomics reveals the spatial organization of cells in tumors, which is associated with important clinical outcomes such as survival and treatment response. This spatial organization is often summarized using spatial summary statistics, including Ripley's K and Besag's L. However, if multiple regions of the same tumor are imaged, it is unclear how to synthesize the relationship with a single patient-level endpoint. We evaluate extant approaches for accommodating multiple images within the context of associating summary statistics with outcomes. First, we consider averaging-based approaches wherein multiple summaries for a single sample are combined in a weighted mean. We then propose a novel class of ensemble testing approaches in which we simulate random weights used to aggregate summaries, test for an association with outcomes, and combine the $P$-values. We systematically evaluate the performance of these approaches via simulation and application to data from non-small cell lung cancer, colorectal cancer, and triple negative breast cancer. We find that the optimal strategy varies, but a simple weighted average of the summary statistics based on the number of cells in each image often offers the highest power and controls type I error effectively. When the size of the imaged regions varies, incorporating this variation into the weighted aggregation may yield additional power in cases where the varying size is informative. Ensemble testing (but not resampling) offered high power and type I error control across conditions in our simulated data sets.
多重空间蛋白质组学揭示了肿瘤细胞的空间组织,这与重要的临床结果相关,如生存和治疗反应。这种空间组织通常使用空间总结统计数据来概括,包括 Ripley 的 K 和 Besag 的 L。然而,如果对同一肿瘤的多个区域进行成像,则不清楚如何将其与单个患者水平的终点联系起来。我们评估了现有的方法,以适应与结果相关的空间总结统计数据的多个图像。首先,我们考虑基于平均值的方法,其中对单个样本的多个摘要进行加权平均组合。然后,我们提出了一类新的集成测试方法,其中我们模拟用于聚合摘要的随机权重,测试与结果的关联,并组合 P 值。我们通过模拟和应用于非小细胞肺癌、结直肠癌和三阴性乳腺癌的数据来系统地评估这些方法的性能。我们发现最佳策略因情况而异,但基于每个图像中细胞数量的简单摘要统计加权平均值通常提供最高的功效并有效地控制 I 型错误。当成像区域的大小变化时,在变化大小提供信息的情况下,将这种变化纳入加权聚合中可能会获得额外的功效。在我们的模拟数据集的所有条件下,集成测试(但不是重采样)提供了高功效和 I 型错误控制。