Seal Souvik, Vu Thao, Ghosh Tusharkanti, Wrobel Julia, Ghosh Debashis
Department of Biostatistics and Informatics, University of Colorado CU Anschutz Medical Campus, Aurora, CO, USA.
Bioinform Adv. 2022 May 23;2(1):vbac039. doi: 10.1093/bioadv/vbac039. eCollection 2022.
Multiplex imaging platforms have become popular for studying complex single-cell biology in the tumor microenvironment (TME) of cancer subjects. Studying the intensity of the proteins that regulate important cell-functions becomes extremely crucial for subject-specific assessment of risks. The conventional approach requires selection of two thresholds, one to define the cells of the TME as positive or negative for a particular protein, and the other to classify the subjects based on the proportion of the positive cells. We present a threshold-free approach in which distance between a pair of subjects is computed based on the probability density of the protein in their TMEs. The distance matrix can either be used to classify the subjects into meaningful groups or can directly be used in a kernel machine regression framework for testing association with clinical outcomes. The method gets rid of the subjectivity bias of the thresholding-based approach, enabling easier but interpretable analysis. We analyze a lung cancer dataset, finding the difference in the density of protein HLA-DR to be significantly associated with the overall survival and a triple-negative breast cancer dataset, analyzing the effects of multiple proteins on survival and recurrence. The reliability of our method is demonstrated through extensive simulation studies.
The associated package can be found here, https://github.com/sealx017/DenVar.
Supplementary data are available at online.
多重成像平台在研究癌症患者肿瘤微环境(TME)中的复杂单细胞生物学方面已变得很流行。研究调节重要细胞功能的蛋白质强度对于患者特异性风险评估极为关键。传统方法需要选择两个阈值,一个用于将TME中的细胞定义为特定蛋白质的阳性或阴性,另一个用于根据阳性细胞的比例对患者进行分类。我们提出了一种无阈值方法,其中基于一对患者TME中蛋白质的概率密度来计算他们之间的距离。距离矩阵既可以用于将患者分类为有意义的组,也可以直接用于核机器回归框架中以测试与临床结果的关联。该方法消除了基于阈值方法的主观偏差,使得分析更简单且可解释。我们分析了一个肺癌数据集,发现蛋白质HLA-DR密度的差异与总生存期显著相关,还分析了一个三阴性乳腺癌数据集,研究了多种蛋白质对生存和复发的影响。通过广泛的模拟研究证明了我们方法的可靠性。
相关软件包可在此处找到,https://github.com/sealx017/DenVar。
补充数据可在网上获取。