Naghdloo Amin, Tessone Dean, Nagaraju Rajiv M, Zhang Brian, Kang Jeffrey, Li Shouyi, Oberai Assad, Hicks James B, Kuhn Peter
Convergent Science Institute in Cancer, University of Southern California, Los Angeles, 90089, CA, USA.
Department of Aerospace and Mechanical Engineering, University of Southern California, Los Angeles, 90089, CA, USA.
bioRxiv. 2025 May 24:2025.05.21.655334. doi: 10.1101/2025.05.21.655334.
Tumor-associated cells derived from a liquid biopsy are promising biomarkers for cancer detection, diagnosis, prognosis, and monitoring. However, their rarity, heterogeneity and plasticity make precise identification and biological characterization challenging for clinical utility. Enrichment-free approaches using whole slide imaging of all circulating cells offer a comprehensive and unbiased strategy for capturing the full spectrum of tumor-associated cell phenotypes. However, current analysis methods often depend on engineered features and manual expert review, making them sensitive to technical variations and subjective biases. These limitations highlight the need for a better feature representation to improve performance and reproducibility of applications in large-scale patient cohort analyses. In this study, we present a deep contrastive learning framework for learning features of all circulating cells, enabling robust identification and stratification of single cells in whole slide immunofluorescence microscopy images. We demonstrate performance of learned features in classification of diverse cell phenotypes in the liquid biopsy, achieving an accuracy of 92.64%. We further demonstrate that learned features improve performance in downstream applications such as outlier detection and clustering. Lastly, our feature representation enables automated identification and enumeration of distinct rare cell phenotypes, achieving average F1-score of 0.93 across cell lines mimicking circulating tumor cells and endothelial cells in contrived samples and average F1-score of 0.858 across CTC phenotypes in clinical samples. This workflow has significant implications for scalable analysis of tumor-associated cellular biomarkers in clinical prognosis and personalized treatment strategies.
源自液体活检的肿瘤相关细胞是癌症检测、诊断、预后和监测的有前景的生物标志物。然而,它们的稀有性、异质性和可塑性使得精确识别和生物学特征描述对临床应用具有挑战性。使用所有循环细胞的全玻片成像的无富集方法为捕获肿瘤相关细胞表型的全谱提供了一种全面且无偏倚的策略。然而,当前的分析方法通常依赖于工程特征和人工专家审查,使其对技术变化和主观偏差敏感。这些局限性凸显了需要更好的特征表示来提高大规模患者队列分析中应用的性能和可重复性。在本研究中,我们提出了一种深度对比学习框架,用于学习所有循环细胞的特征,从而能够在全玻片免疫荧光显微镜图像中对单细胞进行稳健的识别和分层。我们展示了所学习特征在液体活检中不同细胞表型分类中的性能,准确率达到92.64%。我们进一步证明,所学习的特征在异常值检测和聚类等下游应用中提高了性能。最后,我们的特征表示能够自动识别和枚举不同的稀有细胞表型,在模拟循环肿瘤细胞和内皮细胞的人工样本中跨细胞系实现平均F1分数为0.93,并在临床样本中跨循环肿瘤细胞(CTC)表型实现平均F1分数为0.858。此工作流程对临床预后和个性化治疗策略中肿瘤相关细胞生物标志物的可扩展分析具有重要意义。