• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于人工智能算法开发的拟议客观经阴道超声图像质量评分系统的观察者内和观察者间一致性

Intra- and interobserver agreement of proposed objective transvaginal ultrasound image-quality scoring system for use in artificial intelligence algorithm development.

作者信息

Deslandes A, Avery J C, Chen H-T, Leonardi M, Knox S, Lo G, O'Hara R, Condous G, Hull M L

机构信息

Robinson Research Institute, University of Adelaide, Adelaide, Australia.

School of Computer and Mathematical Sciences, University of Adelaide, Adelaide, Australia.

出版信息

Ultrasound Obstet Gynecol. 2025 Mar;65(3):364-371. doi: 10.1002/uog.29178. Epub 2025 Jan 24.

DOI:10.1002/uog.29178
PMID:39854656
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11872342/
Abstract

OBJECTIVES

The development of valuable artificial intelligence (AI) tools to assist with ultrasound diagnosis depends on algorithms developed using high-quality data. This study aimed to test the intra- and interobserver agreement of a proposed image-quality scoring system to quantify the quality of gynecological transvaginal ultrasound (TVS) images, which could be used in clinical practice and AI tool development.

METHODS

A proposed scoring system to quantify TVS image quality was created following a review of the literature. This system involved a score of 1-4 (2 = poor, 3 = suboptimal and 4 = optimal image quality) assigned by a rater for individual ultrasound images. If the image was deemed inaccurate, it was assigned a score of 1, corresponding to 'reject'. Six professionals, including two radiologists, two sonographers and two sonologists, reviewed 150 images (50 images of the uterus and 100 images of the ovaries) obtained from 50 women, assigning each image a score of 1-4. The review of all images was repeated a second time by each rater after a period of at least 1 week. Mean scores were calculated for each rater. Overall interobserver agreement was assessed using intraclass correlation coefficient (ICC), and interobserver agreement between paired professionals and intraobserver agreement for all professionals were assessed using weighted Cohen's kappa and ICC.

RESULTS

Poor levels of interobserver agreement were obtained between the six raters for all 150 images (ICC, 0.480 (95% CI, 0.363-0.586)), as well as for assessment of the uterine images only (ICC, 0.359 (95% CI, 0.204-0.523)). Moderate agreement was achieved for the ovarian images (ICC, 0.531 (95% CI, 0.417-0.636)). Agreement between the paired sonographers and sonologists was poor for all images (ICC, 0.336 (95% CI, -0.078 to 0.619) and 0.425 (95% CI, 0.014-0.665), respectively), as well as when images were grouped into uterine images (ICC, 0.253 (95% CI, -0.097 to 0.577) and 0.299 (95% CI, -0.094 to 0.606), respectively) and ovarian images (ICC, 0.400 (95% CI, -0.043 to 0.669) and 0.469 (95% CI, 0.088-0.689), respectively). Agreement between the paired radiologists was moderate for all images (ICC, 0.600 (95% CI, 0.487-0.693)) and for their assessment of uterine images (ICC, 0.538 (95% CI, 0.311-0.707)) and ovarian images (ICC, 0.621 (95% CI, 0.483-0.728)). Weak-to-moderate intraobserver agreement was seen for each of the raters with weighted Cohen's kappa ranging from 0.533 to 0.718 for all images and from 0.467 to 0.751 for ovarian images. Similarly, for all raters, the ICC indicated moderate-to-good intraobserver agreement for all images overall (ICC ranged from 0.636 to 0.825) and for ovarian images (ICC ranged from 0.596 to 0.862). Slightly better intraobserver agreement was seen for uterine images, with weighted Cohen's kappa ranging from 0.568 to 0.808 indicating weak-to-strong agreement, and ICC ranging from 0.546 to 0.893 indicating moderate-to-good agreement. All measures were statistically significant (P < 0.001).

CONCLUSION

The proposed image quality scoring system was shown to have poor-to-moderate interobserver agreement and mostly weak-to-moderate levels of intraobserver agreement. More refinement of the scoring system may be needed to improve agreement, although it remains unclear whether quantification of image quality can be achieved, given the highly subjective nature of ultrasound interpretation. Although some AI systems can tolerate labeling noise, most will favor clean (high-quality) data. As such, innovative data-labeling strategies are needed. © 2025 The Author(s). Ultrasound in Obstetrics & Gynecology published by John Wiley & Sons Ltd on behalf of International Society of Ultrasound in Obstetrics and Gynecology.

摘要

目的

开发有价值的人工智能(AI)工具以辅助超声诊断依赖于使用高质量数据开发的算法。本研究旨在测试一种拟议的图像质量评分系统在观察者内和观察者间的一致性,该系统用于量化妇科经阴道超声(TVS)图像的质量,可用于临床实践和AI工具开发。

方法

在回顾文献后创建了一种拟议的评分系统来量化TVS图像质量。该系统由评估者为单个超声图像分配1 - 4分(2 = 质量差,3 = 次优,4 = 图像质量最佳)。如果图像被认为不准确,则分配1分,对应“拒绝”。六名专业人员,包括两名放射科医生、两名超声技师和两名超声科医生,对从50名女性获得的150张图像(50张子宫图像和100张卵巢图像)进行了评估,为每张图像分配1 - 4分。在至少1周的间隔后,每位评估者再次对所有图像进行评估。计算每位评估者的平均得分。使用组内相关系数(ICC)评估总体观察者间一致性,使用加权科恩kappa系数和ICC评估配对专业人员之间的观察者间一致性以及所有专业人员的观察者内一致性。

结果

六名评估者对所有150张图像的观察者间一致性水平较差(ICC,0.480(95%CI,0.363 - 0.586)),仅对子宫图像的评估也是如此(ICC,0.359(95%CI,0.204 - 0.523))。对卵巢图像的一致性为中等(ICC,0.531(95%CI,0.417 - 0.636))。配对的超声技师和超声科医生对所有图像的一致性较差(ICC分别为0.336(95%CI, - 0.078至0.619)和0.425(95%CI,0.014 - 0.665)),当图像分为子宫图像(ICC分别为0.253(95%CI, - 0.097至0.577)和0.299(95%CI, - 0.094至0.606))和卵巢图像时也是如此(ICC分别为0.400(95%CI, - 0.043至0.669)和0.469(95%CI,0.088 - 0.689))。配对放射科医生对所有图像的一致性中等(ICC,0.600(95%CI,0.487 - 0.693)),对子宫图像(ICC,0.538(95%CI,0.311 - 0.707))和卵巢图像(ICC,0.621(95%CI,0.483 - 0.728))的评估也是如此。每位评估者的观察者内一致性为弱至中等,所有图像的加权科恩kappa系数范围为0.533至0.718,卵巢图像的范围为0.467至0.751。同样,对于所有评估者,ICC表明所有图像总体的观察者内一致性为中等至良好(ICC范围为0.636至0.825),卵巢图像的ICC范围为0.596至0.862。子宫图像的观察者内一致性略好,加权科恩kappa系数范围为0.568至0.808表明为弱至强一致性,ICC范围为0.546至0.893表明为中等至良好一致性。所有测量均具有统计学意义(P < 0.001)。

结论

拟议的图像质量评分系统显示观察者间一致性为差至中等,观察者内一致性大多为弱至中等。可能需要对评分系统进行更多改进以提高一致性,尽管鉴于超声解读的高度主观性,尚不清楚是否能够实现图像质量的量化。虽然一些AI系统可以容忍标注噪声,但大多数将青睐干净(高质量)的数据。因此,需要创新的数据标注策略。© 2025作者。《妇产科超声》由John Wiley & Sons Ltd代表国际妇产科超声学会出版。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8719/11872342/891eefe9f413/UOG-65-364-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8719/11872342/fe930cee911e/UOG-65-364-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8719/11872342/891eefe9f413/UOG-65-364-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8719/11872342/fe930cee911e/UOG-65-364-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8719/11872342/891eefe9f413/UOG-65-364-g001.jpg

相似文献

1
Intra- and interobserver agreement of proposed objective transvaginal ultrasound image-quality scoring system for use in artificial intelligence algorithm development.用于人工智能算法开发的拟议客观经阴道超声图像质量评分系统的观察者内和观察者间一致性
Ultrasound Obstet Gynecol. 2025 Mar;65(3):364-371. doi: 10.1002/uog.29178. Epub 2025 Jan 24.
2
The prediction of pouch of Douglas obliteration using offline analysis of the transvaginal ultrasound 'sliding sign' technique: inter- and intra-observer reproducibility.经阴道超声“滑动征”技术的离线分析预测道格拉斯窝消失:观察者间和观察者内的可重复性。
Hum Reprod. 2013 May;28(5):1237-46. doi: 10.1093/humrep/det044. Epub 2013 Mar 12.
3
Interobserver agreement of transvaginal ultrasound and magnetic resonance imaging in local staging of cervical cancer.经阴道超声与磁共振成像在宫颈癌局部分期中的观察者间一致性。
Ultrasound Obstet Gynecol. 2021 Nov;58(5):773-779. doi: 10.1002/uog.23662.
4
Prediction of vesicouterine adhesions by transvaginal sonographic sliding sign technique: validation study.经阴道超声滑动征技术预测膀胱子宫粘连:验证研究
Ultrasound Obstet Gynecol. 2025 Jan;65(1):114-121. doi: 10.1002/uog.29128. Epub 2024 Nov 25.
5
Intra- and interobserver reproducibility of transvaginal ultrasound for the detection and measurement of endometriotic lesions of the bowel.经阴道超声检测和测量肠道子宫内膜异位症病变的观察者内和观察者间的可重复性。
Acta Obstet Gynecol Scand. 2023 Oct;102(10):1306-1315. doi: 10.1111/aogs.14660. Epub 2023 Aug 28.
6
Ultrasound examiners' ability to describe ovarian cancer spread using preacquired ultrasound videoclips from a selected patient sample with high prevalence of cancer spread.超声检查人员利用预先获取的、来自癌症扩散患病率高的特定患者样本的超声视频片段来描述卵巢癌扩散情况的能力。
Ultrasound Obstet Gynecol. 2025 May;65(5):641-652. doi: 10.1002/uog.29208. Epub 2025 Apr 18.
7
Doppler Color Scoring System in Women With an Incomplete Miscarriage: Interobserver and Intraobserver Reproducibility Study.多普勒彩色评分系统在不完全流产女性中的应用:观察者间和观察者内可重复性研究。
J Ultrasound Med. 2019 Sep;38(9):2437-2445. doi: 10.1002/jum.14942. Epub 2019 Jan 29.
8
Interobserver Variability of Hip Dysplasia Indices on Sweep Ultrasound for Novices, Experts, and Artificial Intelligence.髋关节超声扫查中新手、专家和人工智能对髋关节发育不良指标的观察者间变异性。
J Pediatr Orthop. 2022 Apr 1;42(4):e315-e323. doi: 10.1097/BPO.0000000000002065.
9
Sonographers' self-reported visualization of normal postmenopausal ovaries on transvaginal ultrasound is not reliable: results of expert review of archived images from UKCTOCS.经专家对英国 CTOCS 存档图像进行审查后发现,超声科医生对经阴道超声检查中正常绝经后卵巢的自我报告可视化结果不可靠。
Ultrasound Obstet Gynecol. 2018 Mar;51(3):401-408. doi: 10.1002/uog.18836.
10
Reliability of 3-dimensional transvaginal sonographic measurement of lower uterine segment thickness.经阴道三维超声测量子宫下段厚度的可靠性。
J Ultrasound Med. 2012 Jun;31(6):933-9. doi: 10.7863/jum.2012.31.6.933.

本文引用的文献

1
Human-AI collaborative multi-modal multi-rater learning for endometriosis diagnosis.用于子宫内膜异位症诊断的人机协作多模态多评分者学习
Phys Med Biol. 2024 Dec 24;70(1). doi: 10.1088/1361-6560/ad997e.
2
Development and Validation of a Point-of-Care-Ultrasound Image Quality Assessment Tool: The POCUS IQ Scale.即时超声影像质量评估工具(POCUS IQ 量表)的研发与验证
J Ultrasound Med. 2023 Jan;42(1):135-145. doi: 10.1002/jum.16095. Epub 2022 Sep 27.
3
Towards Clinical Application of Artificial Intelligence in Ultrasound Imaging.人工智能在超声成像中的临床应用探索。
Biomedicines. 2021 Jun 23;9(7):720. doi: 10.3390/biomedicines9070720.
4
Deep Learning Pitfall: Impact of Novel Ultrasound Equipment Introduction on Algorithm Performance and the Realities of Domain Adaptation.深度学习陷阱:新型超声设备引入对算法性能的影响及域适应的现实情况。
J Ultrasound Med. 2022 Apr;41(4):855-863. doi: 10.1002/jum.15765. Epub 2021 Jun 16.
5
Automatic quality assessment for 2D fetal sonographic standard plane based on multitask learning.基于多任务学习的二维胎儿超声标准切面自动质量评估。
Medicine (Baltimore). 2021 Jan 29;100(4):e24427. doi: 10.1097/MD.0000000000024427.
6
A machine learning algorithm supports ultrasound-naïve novices in the acquisition of diagnostic echocardiography loops and provides accurate estimation of LVEF.机器学习算法支持超声零基础的新手获取诊断超声心动图图像,并提供准确的 LVEF 估计。
Int J Cardiovasc Imaging. 2021 Feb;37(2):577-586. doi: 10.1007/s10554-020-02046-6. Epub 2020 Oct 8.
7
Introduction to artificial intelligence in ultrasound imaging in obstetrics and gynecology.妇产科超声影像学中的人工智能介绍。
Ultrasound Obstet Gynecol. 2020 Oct;56(4):498-505. doi: 10.1002/uog.22122.
8
Self-Supervised Representation Learning for Ultrasound Video.超声视频的自监督表征学习
Proc IEEE Int Symp Biomed Imaging. 2020 Apr 3;2020:1847-1850. doi: 10.1109/ISBI45749.2020.9098666.
9
Reproducibility of a quantitative system for assessing the quality of diagnostic ultrasound.一种用于评估诊断超声质量的定量系统的可重复性。
Radiol Bras. 2018 May-Jun;51(3):172-177. doi: 10.1590/0100-3984.2017.0021.
10
Image-scoring system for umbilical and uterine artery pulsed-wave Doppler ultrasound measurement.脐动脉和子宫动脉脉冲多普勒超声测量的图像评分系统。
Ultrasound Obstet Gynecol. 2019 Feb;53(2):251-255. doi: 10.1002/uog.19101.