Verdi Elvan Burak, Yılmaz Muhammed, Doğan Mülazimoğlu Deniz, Türker Abdussamet, Gürün Kaya Aslıhan, Işık Özlem, Bostanoğlu Karaçin Aslı, Velioğlu Yakut Övgü, Yenigün Bülent Mustafa, Uzun Çağlar, Elhan Atilla Halil, Özdemir Kumbasar Özlem, Kaya Akın, Kayı Cangır Ayten, Taşçı Cantürk, Özbayoğlu Ahmet Murat, Erol Serhat
Department of Chest Diseases, Ankara University Faculty of Medicine, Ankara, Turkey.
School of Engineering, TOBB University of Economics and Technology, Ankara, Turkey.
J Investig Med. 2024 Jan;72(1):88-99. doi: 10.1177/10815589231208479. Epub 2023 Nov 10.
The generalizability of artificial intelligence (AI) models is a major issue in the field of AI applications. Therefore, we aimed to overcome the generalizability problem of an AI model developed for a particular center for pneumothorax detection using a small dataset for external validation. Chest radiographs of patients diagnosed with pneumothorax (n = 648) and those without pneumothorax (n = 650) who visited the Ankara University Faculty of Medicine (AUFM; center 1) were obtained. A deep learning-based pneumothorax detection algorithm (PDA-Alpha) was developed using the AUFM dataset. For implementation at the Health Sciences University (HSU; center 2), PDA-Beta was developed through external validation of PDA-Alpha using 50 radiographs with pneumothorax obtained from HSU. Both PDA algorithms were assessed using the HSU test dataset (n = 200) containing 50 pneumothorax and 150 non-pneumothorax radiographs. We compared the results generated by the algorithms with those of physicians to demonstrate the reliability of the results. The areas under the curve for PDA-Alpha and PDA-Beta were 0.993 (95% confidence interval (CI): 0.985-1.000) and 0.986 (95% CI: 0.962-1.000), respectively. Both algorithms successfully detected the presence of pneumothorax on 49/50 radiographs; however, PDA-Alpha had seven false-positive predictions, whereas PDA-Beta had one. The positive predictive value increased from 0.525 to 0.886 after external validation (p = 0.041). The physicians' sensitivity and specificity for detecting pneumothorax were 0.585 and 0.988, respectively. The performance scores of the algorithms were increased with a small dataset; however, further studies are required to determine the optimal amount of external validation data to fully address the generalizability issue.
人工智能(AI)模型的可推广性是AI应用领域的一个主要问题。因此,我们旨在通过使用一个小数据集进行外部验证,来克服为特定气胸检测中心开发的AI模型的可推广性问题。我们获取了访问安卡拉大学医学院(AUFM;中心1)的气胸确诊患者(n = 648)和非气胸患者(n = 650)的胸部X光片。使用AUFM数据集开发了一种基于深度学习的气胸检测算法(PDA-Alpha)。为了在健康科学大学(HSU;中心2)实施,通过使用从HSU获得的50张气胸X光片对PDA-Alpha进行外部验证,开发了PDA-Beta。使用包含50张气胸和150张非气胸X光片的HSU测试数据集(n = 200)对两种PDA算法进行评估。我们将算法生成的结果与医生的结果进行比较,以证明结果的可靠性。PDA-Alpha和PDA-Beta的曲线下面积分别为0.993(95%置信区间(CI):0.985 - 1.000)和0.986(95%CI:0.962 - 1.000)。两种算法均在49/50张X光片上成功检测到气胸的存在;然而,PDA-Alpha有7例假阳性预测,而PDA-Beta有1例。外部验证后阳性预测值从0.525提高到0.886(p = 0.041)。医生检测气胸的敏感性和特异性分别为0.585和0.988。算法的性能得分通过一个小数据集得到了提高;然而,需要进一步研究以确定能够充分解决可推广性问题的外部验证数据的最佳量。