Acharya Vasundhara, Choi Diana, Yener BüLENT, Beamer Gillian
Rensselaer Polytechnic Institute, Troy, NY 12180, USA.
Cummings School of Veterinary Medicine, Tufts University, North Grafton, MA 02155, USA.
IEEE Access. 2024;12:17164-17194. doi: 10.1109/access.2024.3359989. Epub 2024 Jan 30.
Tuberculosis (TB), primarily affecting the lungs, is caused by the bacterium tuberculosis and poses a significant health risk. Detecting acid-fast bacilli (AFB) in stained samples is critical for TB diagnosis. Whole Slide (WS) Imaging allows for digitally examining these stained samples. However, current deep-learning approaches to analyzing large-sized whole slide images (WSIs) often employ patch-wise analysis, potentially missing the complex spatial patterns observed in the granuloma essential for accurate TB classification. To address this limitation, we propose an approach that models cell characteristics and interactions as a graph, capturing both cell-level information and the overall tissue micro-architecture. This method differs from the strategies in related cell graph-based works that rely on edge thresholds based on sparsity/density in cell graph construction, emphasizing a biologically informed threshold determination instead. We introduce a cell graph-based jumping knowledge neural network (CG-JKNN) that operates on the cell graphs where the edge thresholds are selected based on the length of the cords and the activated macrophage nucleus's size to reflect the actual biological interactions observed in the tissue. The primary process involves training a Convolutional Neural Network (CNN) to segment AFBs and macrophage nuclei, followed by converting large (42831*41159 pixels) lung histology images into cell graphs where an activated macrophage nucleus/AFB represents each node within the graph and their interactions are denoted as edges. To enhance the interpretability of our model, we employ Integrated Gradients and Shapely Additive Explanations (SHAP). Our analysis incorporated a combination of 33 graph metrics and 20 cell morphology features. In terms of traditional machine learning models, Extreme Gradient Boosting (XGBoost) was the best performer, achieving an F1 score of 0.9813 and an Area under the Precision-Recall Curve (AUPRC) of 0.9848 on the test set. Among graph-based models, our CG-JKNN was the top performer, attaining an F1 score of 0.9549 and an AUPRC of 0.9846 on the held-out test set. The integration of graph-based and morphological features proved highly effective, with CG-JKNN and XGBoost showing promising results in classifying instances into AFB and activated macrophage nucleus. The features identified as significant by our models closely align with the criteria used by pathologists in practice, highlighting the clinical applicability of our approach. Future work will explore knowledge distillation techniques and graph-level classification into distinct TB progression categories.
结核病(TB)主要影响肺部,由结核杆菌引起,对健康构成重大风险。在染色样本中检测抗酸杆菌(AFB)对于结核病诊断至关重要。全玻片(WS)成像允许对这些染色样本进行数字检查。然而,当前用于分析大尺寸全玻片图像(WSIs)的深度学习方法通常采用逐块分析,可能会错过在肉芽肿中观察到的对准确结核病分类至关重要的复杂空间模式。为了解决这一局限性,我们提出了一种将细胞特征和相互作用建模为图的方法,该方法既能捕获细胞水平的信息,又能捕获整体组织微结构。该方法不同于基于细胞图的相关工作中的策略,后者在细胞图构建中依赖基于稀疏性/密度的边缘阈值,而是强调基于生物学知识的阈值确定。我们引入了一种基于细胞图的跳跃知识神经网络(CG-JKNN),它在细胞图上运行,其中边缘阈值根据索带长度和活化巨噬细胞核的大小来选择,以反映在组织中观察到的实际生物学相互作用。主要过程包括训练卷积神经网络(CNN)来分割AFB和巨噬细胞核,然后将大尺寸(42831*41159像素)的肺组织学图像转换为细胞图,其中活化的巨噬细胞核/AFB代表图中的每个节点,它们之间的相互作用表示为边。为了提高我们模型的可解释性,我们采用了集成梯度和Shapely加法解释(SHAP)。我们的分析结合了33个图指标和20个细胞形态特征。在传统机器学习模型方面,极端梯度提升(XGBoost)表现最佳,在测试集上的F1分数为0.9813,精确召回曲线下面积(AUPRC)为0.9848。在基于图的模型中,我们的CG-JKNN表现最佳,在保留测试集上的F1分数为0.9549,AUPRC为0.9846。基于图的特征和形态特征的整合被证明非常有效,CG-JKNN和XGBoost在将实例分类为AFB和活化巨噬细胞核方面显示出有希望的结果。我们的模型确定为重要的特征与病理学家在实践中使用的标准密切一致,突出了我们方法的临床适用性。未来的工作将探索知识蒸馏技术和将图级分类为不同的结核病进展类别。