Frederiksen Bill Aplin, Hammer Hilde Berner, Terslev Lene, Ammitzbøll-Danielsen Mads, Savarimuthu Thiusius Rajeeth, Weber Anders Bossel Holst, Just Søren Andreas
Section of Rheumatology, Department of Medicine, Svendborg Hospital - Odense University Hospital, Svendborg, Denmark.
Center for Treatment of Rheumatic and Musculoskeletal Diseases (REMEDY), Diakonhjemmet Hospital, Oslo, Norway.
RMD Open. 2025 Aug 5;11(3):e005805. doi: 10.1136/rmdopen-2025-005805.
To evaluate the agreement and repeatability of an automated robotic ultrasound system (ARTHUR V.2.0) combined with an AI model (DIANA V.2.0) in assessing synovial hypertrophy (SH) and Doppler activity in rheumatoid arthritis (RA) patients, using an expert rheumatologist's assessment as the reference standard.
30 RA patients underwent two consecutive ARTHUR V.2.0 scans and rheumatologist assessment of 22 hand joints, with the rheumatologist blinded to the automated system's results. Images were scored for SH and Doppler by DIANA V.2.0 using the EULAR-OMERACT scale (0-3). The agreement was evaluated by weighted Cohen's kappa, percent exact agreement (PEA), percent close agreement (PCA) and binary outcomes using Global OMERACT-EULAR Synovitis Scoring (healthy ≤1 vs diseased ≥2). Comparisons included intra-robot repeatability and agreement with the expert rheumatologist and a blinded independent assessor.
ARTHUR successfully scanned 564 out of 660 joints, corresponding to an overall success rate of 85.5%. Intra-robot agreement for SH: PEA 63.0%, PCA 93.0%, binary 90.5% and for Doppler, PEA 74.8%, PCA 93.7%, binary 88.1% and kappa values of 0.54 and 0.49. Agreement between ARTHUR+DIANA and the rheumatologist: SH (PEA 57.9%, PCA 92.9%, binary 87.3%, kappa 0.38); Doppler (PEA 77.3%, PCA 94.2%, binary 91.2%, kappa 0.44) and with the independent assessor: SH (PEA 49.0%, PCA 91.2%, binary 80.0%, kappa 0.39); Doppler (PEA 62.6%, PCA 94.4%, binary 88.1%, kappa 0.48).
ARTHUR V.2.0 and DIANA V.2.0 demonstrated repeatability on par with intra-expert agreement reported in the literature and showed encouraging agreement with human assessors, though further refinement is needed to optimise performance across specific joints.
以专家风湿病学家的评估为参考标准,评估自动机器人超声系统(ARTHUR V.2.0)联合人工智能模型(DIANA V.2.0)在评估类风湿关节炎(RA)患者滑膜增生(SH)和多普勒活性方面的一致性和可重复性。
30例RA患者连续接受两次ARTHUR V.2.0扫描以及风湿病学家对22个手部关节的评估,风湿病学家对自动系统的结果不知情。DIANA V.2.0使用EULAR-OMERACT量表(0-3)对SH和多普勒进行评分。通过加权Cohen's kappa、精确一致百分比(PEA)、接近一致百分比(PCA)以及使用全球OMERACT-EULAR滑膜炎评分(健康≤1 vs患病≥2)的二元结果来评估一致性。比较包括机器人内部的可重复性以及与专家风湿病学家和一位不知情的独立评估者的一致性。
ARTHUR成功扫描了660个关节中的564个,总体成功率为85.5%。机器人内部SH的一致性:PEA为63.0%,PCA为93.0%,二元结果为90.5%;多普勒方面,PEA为74.8%,PCA为93.7%,二元结果为88.1%,kappa值分别为0.54和0.49。ARTHUR+DIANA与风湿病学家之间的一致性:SH(PEA为57.9%,PCA为92.9%,二元结果为87.3%,kappa为0.38);多普勒(PEA为77.3%,PCA为94.2%,二元结果为91.2%,kappa为0.44);与独立评估者的一致性:SH(PEA为49.0%,PCA为91.2%,二元结果为80.0%,kappa为0.39);多普勒(PEA为62.6%,PCA为94.4%,二元结果为88.1%,kappa为0.48)。
ARTHUR V.2.0和DIANA V.2.0表现出与文献中报道的专家内部一致性相当的可重复性,并且与人类评估者的一致性令人鼓舞,不过需要进一步改进以优化特定关节的性能。