• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

人工智能形态测量训练和测试数据集中的异常值和异常情况——来自脾脏CT扫描的证据

Outliers and anomalies in training and testing datasets for AI-powered morphometry-evidence from CT scans of the spleen.

作者信息

Vasilev Yuriy, Pamova Anastasia, Bobrovskaya Tatiana, Vladzimirskyy Anton, Omelyanskaya Olga, Astapenko Elena, Kruchinkin Artem, Vladimir Novik, Arzamasov Kirill

机构信息

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Moscow Health Care Department, Moscow, Russia.

National Medical and Surgical Center named after N.I. Pirogov of the Ministry of Health of the Russian Federation, Moscow, Russia.

出版信息

Front Artif Intell. 2025 Jul 15;8:1607348. doi: 10.3389/frai.2025.1607348. eCollection 2025.

DOI:10.3389/frai.2025.1607348
PMID:40735111
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12303909/
Abstract

INTRODUCTION

Creating training and testing datasets for machine learning algorithms to measure linear dimensions of organs is a tedious task. There are no universally accepted methods for evaluating outliers or anomalies in such datasets. This can cause errors in machine learning and compromise the quality of end products. The goal of this study is to identify optimal methods for detecting organ anomalies and outliers in medical datasets designed to train and test neural networks in morphometrics.

METHODS

A dataset was created containing linear measurements of the spleen obtained from CT scans. Labelling was performed by three radiologists. The total number of studies included in the sample was  = 197 patients. Using visual methods (1.5 interquartile range; heat map; boxplot; histogram; scatter plot), machine learning algorithms (Isolation forest; Density-Based Spatial Clustering of Applications with Noise; K-nearest neighbors algorithm; Local outlier factor; One-class support vector machines; EllipticEnvelope; Autoencoders), and mathematical statistics (z-score, Grubb's test; Rosner's test).

RESULTS

We identified measurement errors, input errors, abnormal size values and non-standard shapes of the organ (sickle-shaped, round, triangular, additional lobules). The most effective methods included visual techniques (including boxplots and histograms) and machine learning algorithms such is OSVM, KNN and autoencoders. A total of 32 outlier anomalies were found.

DISCUSSION

Curation of complex morphometric datasets must involve thorough mathematical and clinical analyses. Relying solely on mathematical statistics or machine learning methods appears inadequate.

摘要

引言

为机器学习算法创建用于测量器官线性尺寸的训练和测试数据集是一项繁琐的任务。目前尚无普遍接受的方法来评估此类数据集中的异常值或异常情况。这可能会导致机器学习中的错误,并影响最终产品的质量。本研究的目的是确定在形态计量学中用于训练和测试神经网络的医学数据集中检测器官异常和异常值的最佳方法。

方法

创建了一个包含从CT扫描获得的脾脏线性测量值的数据集。由三位放射科医生进行标注。样本中纳入的研究总数为197例患者。使用视觉方法(1.5倍四分位数间距;热图;箱线图;直方图;散点图)、机器学习算法(孤立森林;基于密度的空间聚类应用噪声;K近邻算法;局部离群因子;单类支持向量机;椭圆包络;自动编码器)以及数理统计方法(z分数、格拉布斯检验;罗斯纳检验)。

结果

我们识别出测量误差、输入误差、器官大小异常值以及非标准形状(镰刀形、圆形、三角形、额外小叶)。最有效的方法包括视觉技术(包括箱线图和直方图)以及机器学习算法,如单类支持向量机、K近邻算法和自动编码器。共发现32个异常值异常情况。

讨论

复杂形态计量数据集的管理必须涉及全面的数学和临床分析。仅依靠数理统计或机器学习方法似乎并不充分。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eb5/12303909/1fa5016b7f1d/frai-08-1607348-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eb5/12303909/106733e3094e/frai-08-1607348-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eb5/12303909/8bdaf37331d1/frai-08-1607348-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eb5/12303909/448bf77c8981/frai-08-1607348-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eb5/12303909/4cf4c61eaf2b/frai-08-1607348-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eb5/12303909/5c8441c4a829/frai-08-1607348-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eb5/12303909/91fad0fb03ce/frai-08-1607348-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eb5/12303909/091052b35d30/frai-08-1607348-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eb5/12303909/52dda8f66630/frai-08-1607348-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eb5/12303909/075bc436060f/frai-08-1607348-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eb5/12303909/1fa5016b7f1d/frai-08-1607348-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eb5/12303909/106733e3094e/frai-08-1607348-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eb5/12303909/8bdaf37331d1/frai-08-1607348-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eb5/12303909/448bf77c8981/frai-08-1607348-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eb5/12303909/4cf4c61eaf2b/frai-08-1607348-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eb5/12303909/5c8441c4a829/frai-08-1607348-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eb5/12303909/91fad0fb03ce/frai-08-1607348-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eb5/12303909/091052b35d30/frai-08-1607348-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eb5/12303909/52dda8f66630/frai-08-1607348-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eb5/12303909/075bc436060f/frai-08-1607348-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eb5/12303909/1fa5016b7f1d/frai-08-1607348-g010.jpg

相似文献

1
Outliers and anomalies in training and testing datasets for AI-powered morphometry-evidence from CT scans of the spleen.人工智能形态测量训练和测试数据集中的异常值和异常情况——来自脾脏CT扫描的证据
Front Artif Intell. 2025 Jul 15;8:1607348. doi: 10.3389/frai.2025.1607348. eCollection 2025.
2
AI-based Hepatic Steatosis Detection and Integrated Hepatic Assessment from Cardiac CT Attenuation Scans Enhances All-cause Mortality Risk Stratification: A Multi-center Study.基于人工智能的心脏CT衰减扫描检测肝脂肪变性及综合肝脏评估可增强全因死亡风险分层:一项多中心研究
medRxiv. 2025 Jun 11:2025.06.09.25329157. doi: 10.1101/2025.06.09.25329157.
3
Proposal for Using AI to Assess Clinical Data Integrity and Generate Metadata: Algorithm Development and Validation.关于使用人工智能评估临床数据完整性并生成元数据的提案:算法开发与验证
JMIR Med Inform. 2025 Jun 30;13:e60204. doi: 10.2196/60204.
4
Artificial intelligence for detecting keratoconus.人工智能在圆锥角膜检测中的应用。
Cochrane Database Syst Rev. 2023 Nov 15;11(11):CD014911. doi: 10.1002/14651858.CD014911.pub2.
5
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
6
Artificial intelligence for diagnosing exudative age-related macular degeneration.人工智能在渗出性年龄相关性黄斑变性诊断中的应用。
Cochrane Database Syst Rev. 2024 Oct 17;10(10):CD015522. doi: 10.1002/14651858.CD015522.pub2.
7
Classification of finger movements through optimal EEG channel and feature selection.通过最优脑电图通道和特征选择对手指运动进行分类。
Front Hum Neurosci. 2025 Jul 16;19:1633910. doi: 10.3389/fnhum.2025.1633910. eCollection 2025.
8
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益
Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.
9
A Systematic Review and Bibliometric Analysis of Applications of Artificial Intelligence and Machine Learning in Vascular Surgery.人工智能和机器学习在血管外科应用的系统评价与文献计量分析
Ann Vasc Surg. 2022 Sep;85:395-405. doi: 10.1016/j.avsg.2022.03.019. Epub 2022 Mar 24.
10
Leveraging a foundation model zoo for cell similarity search in oncological microscopy across devices.利用基础模型库进行跨设备肿瘤显微镜检查中的细胞相似性搜索。
Front Oncol. 2025 Jun 18;15:1480384. doi: 10.3389/fonc.2025.1480384. eCollection 2025.

本文引用的文献

1
Current trends on the application of artificial intelligence in medical sciences.人工智能在医学科学中的应用现状与趋势
Bioinformation. 2022 Nov 30;18(11):1050-1061. doi: 10.6026/973206300181050. eCollection 2022.
2
On the nature and types of anomalies: a review of deviations in data.论异常的性质与类型:数据偏差综述
Int J Data Sci Anal. 2021;12(4):297-331. doi: 10.1007/s41060-021-00265-1. Epub 2021 Aug 4.
3
Fitting data to model: structural equation modeling diagnosis using two scatter plots.拟合数据到模型:使用两个散点图进行结构方程建模诊断。
Psychol Methods. 2010 Dec;15(4):335-51. doi: 10.1037/a0020140.