Guleria Shan, Schwartz Benjamin, Sharma Yash, Fernandes Philip, Jablonski James, Adewole Sodiq, Srivastava Sanjana, Rhoads Fisher, Porter Michael, Yeghyayan Michelle, Hyatt Dylan, Copland Andrew, Ehsan Lubaina, Brown Donald, Syed Sana
Rush University Medical Center, Department of Internal Medicine. Chicago, IL 60607.
University of Virginia, Systems and Information Engineering. Charlottesville, VA 22903.
ArXiv. 2023 Aug 24:arXiv:2308.13035v1.
Technical burdens and time-intensive review processes limit the practical utility of video capsule endoscopy (VCE). Artificial intelligence (AI) is poised to address these limitations, but the intersection of AI and VCE reveals challenges that must first be overcome. We identified five challenges to address. Challenge #1: VCE data are stochastic and contains significant artifact. Challenge #2: VCE interpretation is cost-intensive. Challenge #3: VCE data are inherently imbalanced. Challenge #4: Existing VCE AIMLT are computationally cumbersome. Challenge #5: Clinicians are hesitant to accept AIMLT that cannot explain their process.
An anatomic landmark detection model was used to test the application of convolutional neural networks (CNNs) to the task of classifying VCE data. We also created a tool that assists in expert annotation of VCE data. We then created more elaborate models using different approaches including a multi-frame approach, a CNN based on graph representation, and a few-shot approach based on meta-learning.
When used on full-length VCE footage, CNNs accurately identified anatomic landmarks (99.1%), with gradient weighted-class activation mapping showing the parts of each frame that the CNN used to make its decision. The graph CNN with weakly supervised learning (accuracy 89.9%, sensitivity of 91.1%), the few-shot model (accuracy 90.8%, precision 91.4%, sensitivity 90.9%), and the multi-frame model (accuracy 97.5%, precision 91.5%, sensitivity 94.8%) performed well.
Each of these five challenges is addressed, in part, by one of our AI-based models. Our goal of producing high performance using lightweight models that aim to improve clinician confidence was achieved.
技术负担和耗时的审查流程限制了视频胶囊内镜检查(VCE)的实际效用。人工智能(AI)有望解决这些限制,但AI与VCE的交叉领域存在一些必须首先克服的挑战。我们确定了五个需要解决的挑战。挑战1:VCE数据具有随机性且包含大量伪影。挑战2:VCE解读成本高昂。挑战3:VCE数据本质上不均衡。挑战4:现有的VCE人工智能机器学习技术(AIMLT)计算繁琐。挑战5:临床医生对无法解释其过程的AIMLT持犹豫态度。
使用解剖地标检测模型来测试卷积神经网络(CNN)在VCE数据分类任务中的应用。我们还创建了一个辅助VCE数据专家标注的工具。然后,我们使用不同方法创建了更精细的模型,包括多帧方法、基于图形表示的CNN以及基于元学习的少样本方法。
当应用于全长VCE视频时,CNN准确识别了解剖地标(99.1%),梯度加权类激活映射显示了CNN用于做出决策的每一帧的部分。具有弱监督学习的图形CNN(准确率89.9%,灵敏度91.1%)、少样本模型(准确率90.8%,精确率91.4%,灵敏度90.9%)和多帧模型(准确率97.5%,精确率91.5%,灵敏度94.8%)表现良好。
这五个挑战中的每一个都部分地通过我们基于AI的模型之一得到了解决。我们使用旨在提高临床医生信心的轻量级模型实现高性能的目标得以达成。