Tariq Amara, Purkayastha Saptarshi, Padmanaban Geetha Priya, Krupinski Elizabeth, Trivedi Hari, Banerjee Imon, Gichoya Judy Wawira
Department of Biomedical Informatics, Emory School of Medicine, Atlanta, Georgia.
School of Informatics Computing, Indiana University Purdue University, Indianapolis, Indiana.
J Am Coll Radiol. 2020 Nov;17(11):1371-1381. doi: 10.1016/j.jacr.2020.08.018.
Despite tremendous gains from deep learning and the promise of artificial intelligence (AI) in medicine to improve diagnosis and save costs, there exists a large translational gap to implement and use AI products in real-world clinical situations. Adoption of standards such as Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis, Consolidated Standards of Reporting Trials, and the Checklist for Artificial Intelligence in Medical Imaging is increasing to improve the peer-review process and reporting of AI tools. However, no such standards exist for product-level review.
A review of clinical trials showed a paucity of evidence for radiology AI products; thus, the authors developed a 10-question assessment tool for reviewing AI products with an emphasis on their validation and result dissemination. The assessment tool was applied to commercial and open-source algorithms used for diagnosis to extract evidence on the clinical utility of the tools.
There is limited technical information on methodologies for FDA-approved algorithms compared with open-source products, likely because of intellectual property concerns. Furthermore, FDA-approved products use much smaller data sets compared with open-source AI tools, because the terms of use of public data sets are limited to academic and noncommercial entities, which precludes their use in commercial products.
Overall, this study reveals a broad spectrum of maturity and clinical use of AI products, but a large gap exists in exploring actual performance of AI tools in clinical practice.
尽管深度学习取得了巨大进展,且人工智能(AI)在医学领域有望改善诊断并节省成本,但在实际临床环境中实施和使用AI产品仍存在较大的转化差距。采用诸如个体预后或诊断的多变量预测模型的透明报告、试验报告统一标准以及医学影像人工智能清单等标准,以改进同行评审过程和AI工具的报告。然而,产品层面的评审尚无此类标准。
对临床试验的回顾表明,放射学AI产品的证据匮乏;因此,作者开发了一种包含10个问题的评估工具,用于评审AI产品,重点关注其验证和结果传播。该评估工具应用于用于诊断的商业和开源算法,以提取有关这些工具临床效用的证据。
与开源产品相比,FDA批准算法的方法学技术信息有限,这可能是由于知识产权问题。此外,与开源AI工具相比,FDA批准的产品使用的数据集要小得多,因为公共数据集的使用条款仅限于学术和非商业实体,这排除了它们在商业产品中的使用。
总体而言,本研究揭示了AI产品在成熟度和临床应用方面的广泛差异,但在探索AI工具在临床实践中的实际性能方面仍存在较大差距。