Chen Pingjun, El Hussein Siba, Xing Fuyong, Aminu Muhammad, Kannapiran Aparajith, Hazle John D, Medeiros L Jeffrey, Wistuba Ignacio I, Jaffray David, Khoury Joseph D, Wu Jia
Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA.
Department of Pathology, University of Rochester Medical Center, Rochester, NY 14642, USA.
Cancers (Basel). 2022 May 13;14(10):2398. doi: 10.3390/cancers14102398.
Identifying the progression of chronic lymphocytic leukemia (CLL) to accelerated CLL (aCLL) or transformation to diffuse large B-cell lymphoma (Richter transformation; RT) has significant clinical implications as it prompts a major change in patient management. However, the differentiation between these disease phases may be challenging in routine practice. Unsupervised learning has gained increased attention because of its substantial potential in data intrinsic pattern discovery. Here, we demonstrate that cellular feature engineering, identifying cellular phenotypes via unsupervised clustering, provides the most robust analytic performance in analyzing digitized pathology slides (accuracy = 0.925, AUC = 0.978) when compared to alternative approaches, such as mixed features, supervised features, unsupervised/mixed/supervised feature fusion and selection, as well as patch-based convolutional neural network (CNN) feature extraction. We further validate the reproducibility and robustness of unsupervised feature extraction via stability and repeated splitting analysis, supporting its utility as a diagnostic aid in identifying CLL patients with histologic evidence of disease progression. The outcome of this study serves as proof of principle using an unsupervised machine learning scheme to enhance the diagnostic accuracy of the heterogeneous histology patterns that pathologists might not easily see.
识别慢性淋巴细胞白血病(CLL)进展为加速期慢性淋巴细胞白血病(aCLL)或转化为弥漫性大B细胞淋巴瘤(里氏转化;RT)具有重大临床意义,因为这会促使患者管理发生重大变化。然而,在常规实践中区分这些疾病阶段可能具有挑战性。无监督学习因其在数据内在模式发现方面的巨大潜力而受到越来越多的关注。在这里,我们证明,与混合特征、监督特征、无监督/混合/监督特征融合与选择以及基于补丁的卷积神经网络(CNN)特征提取等替代方法相比,通过无监督聚类识别细胞表型的细胞特征工程在分析数字化病理切片时提供了最稳健的分析性能(准确率 = 0.925,AUC = 0.978)。我们通过稳定性和重复分割分析进一步验证了无监督特征提取的可重复性和稳健性,支持其作为一种诊断辅助手段用于识别有疾病进展组织学证据的CLL患者。本研究结果证明了使用无监督机器学习方案提高病理学家可能不易察觉的异质性组织学模式诊断准确性的原理。