Research Unit of Computer Systems and Bioinformatics, Department of Engineering, Università Campus Bio-Medico di Roma, Rome, Italy.
Operative Research Unit of Radiation Oncology, Fondazione Policlinico Universitario Campus Bio-Medico, Rome, Italy.
Comput Methods Programs Biomed. 2024 Sep;254:108308. doi: 10.1016/j.cmpb.2024.108308. Epub 2024 Jun 28.
In the field of lung cancer research, particularly in the analysis of overall survival (OS), artificial intelligence (AI) serves crucial roles with specific aims. Given the prevalent issue of missing data in the medical domain, our primary objective is to develop an AI model capable of dynamically handling this missing data. Additionally, we aim to leverage all accessible data, effectively analyzing both uncensored patients who have experienced the event of interest and censored patients who have not, by embedding a specialized technique within our AI model, not commonly utilized in other AI tasks. Through the realization of these objectives, our model aims to provide precise OS predictions for non-small cell lung cancer (NSCLC) patients, thus overcoming these significant challenges.
We present a novel approach to survival analysis with missing values in the context of NSCLC, which exploits the strengths of the transformer architecture to account only for available features without requiring any imputation strategy. More specifically, this model tailors the transformer architecture to tabular data by adapting its feature embedding and masked self-attention to mask missing data and fully exploit the available ones. By making use of ad-hoc designed losses for OS, it is able to account for both censored and uncensored patients, as well as changes in risks over time.
We compared our method with state-of-the-art models for survival analysis coupled with different imputation strategies. We evaluated the results obtained over a period of 6 years using different time granularities obtaining a Ct-index, a time-dependent variant of the C-index, of 71.97, 77.58 and 80.72 for time units of 1 month, 1 year and 2 years, respectively, outperforming all state-of-the-art methods regardless of the imputation method used.
The results show that our model not only outperforms the state-of-the-art's performance but also simplifies the analysis in the presence of missing data, by effectively eliminating the need to identify the most appropriate imputation strategy for predicting OS in NSCLC patients.
在肺癌研究领域,人工智能(AI)在总体生存(OS)分析中具有重要作用,其目标明确。鉴于医学领域普遍存在数据缺失的问题,我们的主要目标是开发一种能够动态处理此类缺失数据的 AI 模型。此外,我们旨在利用所有可获得的数据,通过在 AI 模型中嵌入一种特殊技术,有效地分析所有有结局事件的可分析患者和无结局事件的删失患者,这种技术在其他 AI 任务中并不常用。通过实现这些目标,我们的模型旨在为非小细胞肺癌(NSCLC)患者提供精确的 OS 预测,从而克服这些重大挑战。
我们提出了一种新的方法来处理 NSCLC 中带有缺失值的生存分析,该方法利用了转换器架构的优势,仅考虑可用的特征,而不需要任何插补策略。具体来说,该模型通过调整转换器架构的特征嵌入和屏蔽的自注意力来屏蔽缺失数据,并充分利用可用数据,从而使其适用于表格数据。通过使用专门设计的 OS 损失函数,它能够同时考虑有结局事件和无结局事件的患者,以及随时间变化的风险。
我们将我们的方法与用于生存分析的最先进模型进行了比较,这些模型结合了不同的插补策略。我们使用不同的时间粒度在 6 年内评估了结果,获得了 1 个月、1 年和 2 年的时间单位的 Ct 指数(C 指数的时间依赖性变体)分别为 71.97、77.58 和 80.72,优于所有最先进的方法,无论使用哪种插补方法。
结果表明,我们的模型不仅在性能上优于最先进的方法,而且在存在缺失数据的情况下简化了分析,有效地消除了为预测 NSCLC 患者的 OS 而确定最合适的插补策略的需求。