Logullo Patricia, MacCarthy Angela, Dhiman Paula, Kirtley Shona, Ma Jie, Bullock Garrett, Collins Gary S
Department of Orthopaedic Surgery, Wake Forest School of Medicine, Winston-Salem, North Carolina, United States.
BJR Open. 2023 Jun 6;5(1):20220033. doi: 10.1259/bjro.20220033. eCollection 2023.
This study aimed to describe the methodologies used to develop and evaluate models that use artificial intelligence (AI) to analyse lung images in order to detect, segment (outline borders of), or classify pulmonary nodules as benign or malignant.
In October 2019, we systematically searched the literature for original studies published between 2018 and 2019 that described prediction models using AI to evaluate human pulmonary nodules on diagnostic chest images. Two evaluators independently extracted information from studies, such as study aims, sample size, AI type, patient characteristics, and performance. We summarised data descriptively.
The review included 153 studies: 136 (89%) development-only studies, 12 (8%) development and validation, and 5 (3%) validation-only. CT scans were the most common type of image type used (83%), often acquired from public databases (58%). Eight studies (5%) compared model outputs with biopsy results. 41 studies (26.8%) reported patient characteristics. The models were based on different units of analysis, such as patients, images, nodules, or image slices or patches.
The methods used to develop and evaluate prediction models using AI to detect, segment, or classify pulmonary nodules in medical imaging vary, are poorly reported, and therefore difficult to evaluate. Transparent and complete reporting of methods, results and code would fill the gaps in information we observed in the study publications.
We reviewed the methodology of AI models detecting nodules on lung images and found that the models were poorly reported and had no description of patient characteristics, with just a few comparing models' outputs with biopsies results. When lung biopsy is not available, lung-RADS could help standardise the comparisons between the human radiologist and the machine. The field of radiology should not give up principles from the diagnostic accuracy studies, such as the choice for the correct ground truth, just because AI is used. Clear and complete reporting of the reference standard used would help radiologists trust in the performance that AI models claim to have. This review presents clear recommendations about the essential methodological aspects of diagnostic models that should be incorporated in studies using AI to help detect or segmentate lung nodules. The manuscript also reinforces the need for more complete and transparent reporting, which can be helped using the recommended reporting guidelines.
本研究旨在描述用于开发和评估利用人工智能(AI)分析肺部图像以检测、分割(勾勒边界)或分类肺结节为良性或恶性的模型所使用的方法。
2019年10月,我们系统检索了2018年至2019年间发表的关于使用AI评估诊断性胸部图像上人类肺结节的预测模型的原始研究文献。两名评估人员独立从研究中提取信息,如研究目的、样本量、AI类型、患者特征和性能。我们对数据进行了描述性总结。
该综述纳入153项研究:136项(89%)仅为开发研究,12项(8%)为开发与验证研究,5项(3%)仅为验证研究。CT扫描是最常用的图像类型(83%),通常从公共数据库获取(58%)。八项研究(5%)将模型输出与活检结果进行了比较。41项研究(26.8%)报告了患者特征。这些模型基于不同的分析单位,如患者个体、图像、结节或图像切片或图像块。
利用AI在医学影像中检测、分割或分类肺结节的预测模型的开发和评估方法各不相同,报告不充分,因此难以评估。透明且完整地报告方法、结果和代码将填补我们在研究出版物中观察到的信息空白。
我们回顾了在肺部图像上检测结节的AI模型的方法,发现这些模型报告不佳,且未描述患者特征,只有少数研究将模型输出与活检结果进行了比较。当无法进行肺活检时,Lung-RADS有助于规范人类放射科医生与机器之间的比较。放射学领域不应仅仅因为使用了AI就放弃诊断准确性研究中的原则,如选择正确的金标准。清晰完整地报告所使用的参考标准将有助于放射科医生信任AI模型所宣称的性能。本综述针对诊断模型的基本方法学方面提出了明确建议这些方面应纳入使用AI帮助检测或分割肺结节的研究中。该手稿还强调了更完整和透明报告的必要性,使用推荐的报告指南会有所帮助。