Wang Yuqi, Gupta Aarzu, Tushar Fakrul Islam, Riley Breylon, Wang Avivah, Tailor Tina D, Tantum Stacy, Liu Jian-Guo, Bashir Mustafa R, Lo Joseph Y, Lafata Kyle J
Department of Electrical and Computer Engineering, Duke University, Durham, NC, United States of America.
Department of Electrical and Computer Engineering, Duke University, Durham, NC, United States of America.
Artif Intell Med. 2025 Feb;160:103055. doi: 10.1016/j.artmed.2024.103055. Epub 2024 Dec 16.
In this paper, we introduce a novel concordance-based predictive uncertainty (CPU)-Index, which integrates insights from subgroup analysis and personalized AI time-to-event models. Through its application in refining lung cancer screening (LCS) predictions generated by an individualized AI time-to-event model trained with fused data of low dose CT (LDCT) radiomics with patient demographics, we demonstrate its effectiveness, resulting in improved risk assessment compared to the Lung CT Screening Reporting & Data System (Lung-RADS). Subgroup-based Lung-RADS faces challenges in representing individual variations and relies on a limited set of predefined characteristics, resulting in variable predictions. Conversely, personalized AI time-to-event models are hindered by transparency issues and biases from censored data. By measuring the prediction consistency between subgroup analysis and AI time-to-event models, the CPU-Index framework offers a nuanced evaluation of the bias-variance trade-off and improves the transparency and reliability of predictions. Consistency was estimated by the concordance index of subgroup analysis-based similarity rank and model prediction similarity rank. Subgroup analysis-based similarity loss was defined as the sum-of-the-difference between Lung-RADS and feature-level 0-1 loss. Model prediction similarity loss was defined as squared loss. To test our approach, we identified 3,326 patients who underwent LDCT for LCS from 1/1/2015 to 6/30/2020 with confirmation of lung cancer on pathology within one year. For each LDCT image, the lesion associated with a Lung-RADS score was detected using a pretrained deep learning model from Medical Open Network for AI (MONAI), from which radiomic features were extracted. Radiomics were optimally fused with patient demographics via a positional encoding scheme and used to train a neural multi-task logistic regression time-to-event model that predicts malignancy. Performance was maximized when radiomics features were fused with positionally encoded demographic features. In this configuration, our algorithm raised the AUC from 0.81 ± 0.04 to 0.89 ± 0.02. Compared to standard Lung-RADS, our approach reduced the False-Positive-Rate from 0.41 ± 0.02 to 0.30 ± 0.12 while maintaining the same False-Negative-Rate. Our methodology enhances lung cancer risk assessment by estimating prediction uncertainty and adjusting accordingly. Furthermore, the optimal integration of radiomics and patient demographics improved overall diagnostic performance, indicating their complementary nature.
在本文中,我们引入了一种基于一致性的新型预测不确定性(CPU)指数,该指数整合了亚组分析和个性化人工智能生存时间模型的见解。通过将其应用于优化由基于低剂量CT(LDCT)影像组学与患者人口统计学融合数据训练的个性化人工智能生存时间模型生成的肺癌筛查(LCS)预测中,我们证明了其有效性,与肺癌CT筛查报告和数据系统(Lung-RADS)相比,风险评估得到了改善。基于亚组的Lung-RADS在反映个体差异方面面临挑战,并且依赖于一组有限的预定义特征,导致预测结果存在差异。相反,个性化人工智能生存时间模型受到透明度问题和删失数据偏差的阻碍。通过测量亚组分析与人工智能生存时间模型之间的预测一致性,CPU指数框架对偏差-方差权衡提供了细致入微的评估,并提高了预测的透明度和可靠性。一致性通过基于亚组分析的相似性排名和模型预测相似性排名的一致性指数来估计。基于亚组分析的相似性损失定义为Lung-RADS与特征级0-1损失之间的差值之和。模型预测相似性损失定义为平方损失。为了测试我们的方法,我们确定了2015年1月1日至2020年6月30日期间接受LDCT进行LCS且在一年内经病理证实患有肺癌的3326名患者。对于每幅LDCT图像,使用来自医学人工智能开放网络(MONAI)的预训练深度学习模型检测与Lung-RADS评分相关的病变,并从中提取影像组学特征。影像组学通过位置编码方案与患者人口统计学进行优化融合,并用于训练预测恶性肿瘤的神经多任务逻辑回归生存时间模型。当影像组学特征与位置编码的人口统计学特征融合时,性能达到最大化。在此配置下,我们的算法将AUC从0.81±0.04提高到了0.89±0.02。与标准Lung-RADS相比,我们的方法在保持相同假阴性率的同时,将假阳性率从0.41±0.02降低到了0.30±0.12。我们的方法通过估计预测不确定性并相应地进行调整,增强了肺癌风险评估。此外,影像组学与患者人口统计学的优化整合提高了整体诊断性能,表明了它们的互补性。