基于人工智能的全身骨闪烁扫描分析：最优深度学习算法的探索及与人类观察者表现的比较。

PURPOSE: Whole-body bone scintigraphy (WBS) is one of the most widely used modalities in diagnosing malignant bone diseases during the early stages. However, the procedure is time-consuming and requires vigour and experience. Moreover, interpretation of WBS scans in the early stages of the disorders might be challenging because the patterns often reflect normal appearance that is prone to subjective interpretation. To simplify the gruelling, subjective, and prone-to-error task of interpreting WBS scans, we developed deep learning (DL) models to automate two major analyses, namely (i) classification of scans into normal and abnormal and (ii) discrimination between malignant and non-neoplastic bone diseases, and compared their performance with human observers. MATERIALS AND METHODS: After applying our exclusion criteria on 7188 patients from three different centers, 3772 and 2248 patients were enrolled for the first and second analyses, respectively. Data were split into two parts, including training and testing, while a fraction of training data were considered for validation. Ten different CNN models were applied to single- and dual-view input (posterior and anterior views) modes to find the optimal model for each analysis. In addition, three different methods, including squeeze-and-excitation (SE), spatial pyramid pooling (SPP), and attention-augmented (AA), were used to aggregate the features for dual-view input models. Model performance was reported through area under the receiver operating characteristic (ROC) curve (AUC), accuracy, sensitivity, and specificity and was compared with the DeLong test applied to ROC curves. The test dataset was evaluated by three nuclear medicine physicians (NMPs) with different levels of experience to compare the performance of AI and human observers. RESULTS: DenseNet121_AA (DensNet121, with dual-view input aggregated by AA) and InceptionResNetV2_SPP achieved the highest performance (AUC = 0.72) for the first and second analyses, respectively. Moreover, on average, in the first analysis, Inception V3 and InceptionResNetV2 CNN models and dual-view input with AA aggregating method had superior performance. In addition, in the second analysis, DenseNet121 and InceptionResNetV2 as CNN methods and dual-view input with AA aggregating method achieved the best results. Conversely, the performance of AI models was significantly higher than human observers for the first analysis, whereas their performance was comparable in the second analysis, although the AI model assessed the scans in a drastically lower time. CONCLUSION: Using the models designed in this study, a positive step can be taken toward improving and optimizing WBS interpretation. By training DL models with larger and more diverse cohorts, AI could potentially be used to assist physicians in the assessment of WBS images.

目的：全身骨闪烁扫描（WBS）是诊断恶性骨疾病早期阶段的最广泛使用的方法之一。然而，该过程耗时且需要活力和经验。此外，由于模式通常反映易于主观解释的正常外观，因此在疾病的早期阶段解释 WBS 扫描可能具有挑战性。为了简化解释 WBS 扫描的艰巨、主观和容易出错的任务，我们开发了深度学习（DL）模型来自动执行两个主要分析，即（i）将扫描分类为正常和异常，以及（ii）区分恶性和非肿瘤性骨疾病，并将其性能与人类观察者进行比较。

材料和方法：在对来自三个不同中心的 7188 名患者应用排除标准后，分别为第一项和第二项分析招募了 3772 名和 2248 名患者。数据分为两部分，包括培训和测试，同时一部分培训数据用于验证。应用了十种不同的 CNN 模型，用于单视图和双视图输入（后视图和前视图）模式，以找到每个分析的最佳模型。此外，使用挤压激励（SE）、空间金字塔池化（SPP）和注意力增强（AA）三种不同的方法来聚合双视图输入模型的特征。通过接收者操作特征（ROC）曲线下的面积（AUC）、准确性、敏感性和特异性来报告模型性能，并应用 DeLong 测试比较 ROC 曲线。使用不同经验水平的三位核医学医师（NMP）评估测试数据集，以比较 AI 和人类观察者的性能。

结果：DenseNet121_AA（DensNet121，通过 AA 聚合双视图输入）和 InceptionResNetV2_SPP 在第一项和第二项分析中分别实现了最高性能（AUC=0.72）。此外，平均而言，在第一项分析中，Inception V3 和 InceptionResNetV2 CNN 模型以及具有 AA 聚合方法的双视图输入具有更好的性能。此外，在第二项分析中，DenseNet121 和 InceptionResNetV2 作为 CNN 方法和具有 AA 聚合方法的双视图输入实现了最佳结果。相反，对于第一项分析，AI 模型的性能明显高于人类观察者，而在第二项分析中，它们的性能相当，尽管 AI 模型在极短的时间内评估了扫描。

结论：使用本研究设计的模型，可以朝着改善和优化 WBS 解释迈出积极的一步。通过使用更大和更多样化的队列训练 DL 模型，人工智能有可能被用于协助医生评估 WBS 图像。