Department of Small Animal Clinical Sciences, University of Tennessee, Knoxville, Tennessee, USA.
PicoxIA, Maisons-Alfort, France.
Vet Radiol Ultrasound. 2022 Jul;63(4):456-468. doi: 10.1111/vru.13069. Epub 2022 Feb 8.
Convolutional neural networks (CNNs) are commonly used as artificial intelligence (AI) tools for evaluating radiographs, but published studies testing their performance in veterinary patients are currently lacking. The purpose of this retrospective, secondary analysis, diagnostic accuracy study was to compare the error rates of four CNNs to the error rates of 13 veterinary radiologists for evaluating canine thoracic radiographs using an independent gold standard. Radiographs acquired at a referral institution were used to evaluate the four CNNs sharing a common architecture. Fifty radiographic studies were selected at random. The studies were evaluated independently by three board-certified veterinary radiologists for the presence or absence of 15 thoracic labels, thus creating the gold standard through the majority rule. The labels included "cardiovascular," "pulmonary," "pleural," "airway," and "other categories." The error rates for each of the CNNs and for 13 additional board-certified veterinary radiologists were calculated on those same studies. There was no statistical difference in the error rates among the four CNNs for the majority of the labels. However, the CNN's training method impacted the overall error rate for three of 15 labels. The veterinary radiologists had a statistically lower error rate than all four CNNs overall and for five labels (33%). There was only one label ("esophageal dilation") for which two CNNs were superior to the veterinary radiologists. Findings from the current study raise numerous questions that need to be addressed to further develop and standardize AI in the veterinary radiology environment and to optimize patient care.
卷积神经网络(CNN)常用于评估 X 光片的人工智能(AI)工具,但目前缺乏针对兽医患者的性能测试的已发表研究。本回顾性、二次分析、诊断准确性研究的目的是比较四种 CNN 的错误率与 13 位兽医放射科医生评估犬胸部 X 光片的错误率,使用独立的金标准。在转诊机构获取的 X 光片用于评估具有共同架构的四种 CNN。随机选择了 50 项放射学研究。三名董事会认证的兽医放射科医生独立评估这些研究,以确定是否存在 15 个胸部标签,从而通过多数规则创建金标准。标签包括“心血管”、“肺部”、“胸膜”、“气道”和“其他类别”。根据相同的研究计算了每个 CNN 和另外 13 位董事会认证的兽医放射科医生的错误率。对于大多数标签,四种 CNN 的错误率没有统计学差异。然而,CNN 的训练方法对 15 个标签中的三个标签的总体错误率产生了影响。兽医放射科医生的错误率总体上明显低于所有四种 CNN,以及五个标签(33%)。只有一个标签(“食管扩张”)的两个 CNN 优于兽医放射科医生。当前研究的结果提出了许多需要解决的问题,以进一步开发和规范兽医放射学环境中的 AI,并优化患者护理。