Division of Biostatistics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA.
Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA.
Chem Biodivers. 2022 Dec;19(12):e202200746. doi: 10.1002/cbdv.202200746. Epub 2022 Nov 28.
Cancer cell lines serve as model in vitro systems for investigating therapeutic interventions. Recent advances in high-throughput genomic profiling have enabled the systematic comparison between cell lines and patient tumor samples. The highly interconnected nature of biological data, however, presents a challenge when mapping patient tumors to cell lines. Standard clustering methods can be particularly susceptible to the high level of noise present in these datasets and only output clusters at one unknown scale of the data. In light of these challenges, we present NetCellMatch, a robust framework for network-based matching of cell lines to patient tumors. NetCellMatch first constructs a global network across all cell line-patient samples using their genomic similarity. Then, a multi-scale community detection algorithm integrates information across topologically meaningful (clustering) scales to obtain Network-Based Matching Scores (NBMS). NBMS are measures of cluster robustness which map patient tumors to cell lines. We use NBMS to determine representative "avatar" cell lines for subgroups of patients. We apply NetCellMatch to reverse-phase protein array data obtained from The Cancer Genome Atlas for patients and the MD Anderson Cell Line Project for cell lines. Along with avatar cell line identification, we evaluate connectivity patterns for breast, lung, and colon cancer and explore the proteomic profiles of avatars and their corresponding top matching patients. Our results demonstrate our framework's ability to identify both patient-cell line matches and potential proteomic drivers of similarity. Our methods are general and can be easily adapted to other'omic datasets.
癌细胞系是研究治疗干预措施的体外模型。高通量基因组分析的最新进展使得能够在细胞系和患者肿瘤样本之间进行系统比较。然而,生物数据的高度互联性质在将患者肿瘤映射到细胞系时带来了挑战。标准聚类方法特别容易受到这些数据集高噪声水平的影响,并且只能在数据的一个未知尺度上输出聚类。针对这些挑战,我们提出了 NetCellMatch,这是一种用于将细胞系与患者肿瘤进行基于网络匹配的强大框架。NetCellMatch 首先使用它们的基因组相似性在所有细胞系-患者样本之间构建全局网络。然后,多尺度社区检测算法整合跨拓扑有意义(聚类)尺度的信息,以获得基于网络的匹配分数(NBMS)。NBMS 是衡量聚类稳健性的指标,可将患者肿瘤映射到细胞系。我们使用 NBMS 将患者的代表性“化身”细胞系确定为亚组。我们将 NetCellMatch 应用于从癌症基因组图谱获得的患者的反相蛋白阵列数据和 MD 安德森细胞系项目的细胞系。除了识别化身细胞系外,我们还评估了乳腺癌、肺癌和结肠癌的连接模式,并探讨了化身及其相应的顶级匹配患者的蛋白质组学特征。我们的结果表明,我们的框架能够识别患者-细胞系匹配和潜在的蛋白质组相似性驱动因素。我们的方法是通用的,可以很容易地适应其他“组学”数据集。