Institute of Biomedical Engineering, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin, China.
Engineering Research Center of Pulmonary and Critical Care Medicine Technology and Device Ministry of Education, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin, China.
Sci Rep. 2024 Apr 16;14(1):8731. doi: 10.1038/s41598-024-59474-w.
Early diagnosis of lung cancer (LC) can significantly reduce its mortality rate. Considering the limitations of the high false positive rate and reliance on radiologists' experience in computed tomography (CT)-based diagnosis, a multi-modal early LC screening model that combines radiology with other non-invasive, rapid detection methods is warranted. A high-resolution, multi-modal, and low-differentiation LC screening strategy named ensemble text and breath analysis (ETBA) is proposed that ensembles radiology report text analysis and breath analysis. In total, 231 samples (140 LC patients and 91 benign lesions [BL] patients) were screened using proton transfer reaction-time of flight-mass spectrometry and CT screening. Participants were randomly assigned to a training set and a validation set (4:1) with stratification. The report section of the radiology reports was used to train a text analysis (TA) model with a natural language processing algorithm. Twenty-two volatile organic compounds (VOCs) in the exhaled breath and the prediction results of the TA model were used as predictors to develop the ETBA model using an extreme gradient boosting algorithm. A breath analysis model was developed based on the 22 VOCs. The BA and TA models were compared with the ETBA model. The ETBA model achieved a sensitivity of 94.3%, a specificity of 77.3%, and an accuracy of 87.7% with the validation set. The radiologist diagnosis performance with the validation set had a sensitivity of 74.3%, a specificity of 59.1%, and an accuracy of 68.1%. High sensitivity and specificity were obtained by the ETBA model compared with radiologist diagnosis. The ETBA model has the potential to provide sensitivity and specificity in CT screening of LC. This approach is rapid, non-invasive, multi-dimensional, and accurate for LC and BL diagnosis.
早期诊断肺癌(LC)可以显著降低其死亡率。考虑到基于计算机断层扫描(CT)的诊断中假阳性率高和依赖放射科医生经验的局限性,有必要结合放射学与其他非侵入性、快速检测方法建立多模态早期 LC 筛查模型。提出了一种名为集成文本和呼吸分析(ETBA)的高分辨率、多模态和低分化 LC 筛查策略,它集成了放射学报告文本分析和呼吸分析。使用质子转移反应-飞行时间-质谱和 CT 筛查共对 231 个样本(140 例 LC 患者和 91 例良性病变[BL]患者)进行了筛选。参与者通过分层随机分配到训练集和验证集(4:1)。利用自然语言处理算法对放射学报告的报告部分进行训练,以建立文本分析(TA)模型。将 22 种呼出气中的挥发性有机化合物(VOC)和 TA 模型的预测结果作为预测因子,利用极端梯度增强算法开发 ETBA 模型。基于 22 种 VOC 建立了呼吸分析模型。比较了 BA 和 TA 模型与 ETBA 模型。ETBA 模型在验证集上的灵敏度为 94.3%,特异性为 77.3%,准确性为 87.7%。验证集上的放射科医生诊断性能的灵敏度为 74.3%,特异性为 59.1%,准确性为 68.1%。与放射科医生诊断相比,ETBA 模型具有较高的灵敏度和特异性。ETBA 模型有可能在 LC 的 CT 筛查中提供灵敏度和特异性。这种方法快速、非侵入性、多维且对 LC 和 BL 的诊断准确。