微观平均和宏观平均分数的置信区间。

Confidence interval for micro-averaged and macro-averaged scores.

作者信息

Takahashi Kanae, Yamamoto Kouji, Kuchiba Aya, Koyama Tatsuki

机构信息

Department of Medical Statistics, Osaka City University Graduate School of Medicine, Osaka, Japan.

Department of Biostatistics, Hyogo College of Medicine, Hyogo, Japan.

出版信息

Appl Intell (Dordr). 2022 Mar;52(5):4961-4972. doi: 10.1007/s10489-021-02635-5. Epub 2021 Jul 31.

DOI:10.1007/s10489-021-02635-5

PMID:35317080

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8936911/

Abstract

A binary classification problem is common in medical field, and we often use sensitivity, specificity, accuracy, negative and positive predictive values as measures of performance of a binary predictor. In computer science, a classifier is usually evaluated with precision (positive predictive value) and recall (sensitivity). As a single summary measure of a classifier's performance, score, defined as the harmonic mean of precision and recall, is widely used in the context of information retrieval and information extraction evaluation since it possesses favorable characteristics, especially when the prevalence is low. Some statistical methods for inference have been developed for the score in binary classification problems; however, they have not been extended to the problem of multi-class classification. There are three types of scores, and statistical properties of these scores have hardly ever been discussed. We propose methods based on the large sample multivariate central limit theorem for estimating scores with confidence intervals.

摘要

二元分类问题在医学领域很常见，我们经常使用灵敏度、特异度、准确度、阴性预测值和阳性预测值作为二元预测器性能的度量指标。在计算机科学中，分类器通常用精确率（阳性预测值）和召回率（灵敏度）来评估。作为分类器性能的单一汇总度量指标，F1分数定义为精确率和召回率的调和平均数，由于它具有良好的特性，特别是在患病率较低的情况下，因此在信息检索和信息提取评估中被广泛使用。针对二元分类问题中的F1分数，已经开发了一些用于推断的统计方法；然而，它们尚未扩展到多类分类问题。有三种类型的F1分数，而这些F1分数的统计特性几乎从未被讨论过。我们提出了基于大样本多元中心极限定理的方法，用于估计带有置信区间的F1分数。

相似文献

Confidence interval for micro-averaged and macro-averaged scores.微观平均和宏观平均分数的置信区间。

Appl Intell (Dordr). 2022 Mar;52(5):4961-4972. doi: 10.1007/s10489-021-02635-5. Epub 2021 Jul 31.

Hypothesis testing procedure for binary and multi-class F -scores in the paired design.配对设计中二分类和多分类 F 分数的假设检验程序。

Stat Med. 2023 Oct 15;42(23):4177-4192. doi: 10.1002/sim.9853. Epub 2023 Aug 1.

Optimal Thresholding of Classifiers to Maximize F1 Measure.分类器的最优阈值设定以最大化F1度量

Mach Learn Knowl Discov Databases. 2014;8725:225-239. doi: 10.1007/978-3-662-44851-9_15.

Formal definition of the MARS method for quantifying the unique target class discoveries of selected machine classifiers.MARS 方法用于定量选择的机器分类器的独特目标类发现的正式定义。

F1000Res. 2022 Apr 4;11:391. doi: 10.12688/f1000research.110567.2. eCollection 2022.

Accurate determination of imaging modality using an ensemble of text- and image-based classifiers.使用基于文本和图像的分类器集成来准确确定成像方式。

J Digit Imaging. 2012 Feb;25(1):37-42. doi: 10.1007/s10278-011-9399-5.

Automating the assignment of diagnosis codes to patient encounters using example-based and machine learning techniques.使用基于示例和机器学习技术实现患者就诊诊断代码分配的自动化。

J Am Med Inform Assoc. 2006 Sep-Oct;13(5):516-25. doi: 10.1197/jamia.M2077. Epub 2006 Jun 23.

Multi-Layer Perceptron Classifier with the Proposed Combined Feature Vector of 3D CNN Features and Lung Radiomics Features for COPD Stage Classification.基于 3DCNN 特征与肺部放射组学特征的组合特征向量的多层感知机分类器在 COPD 分期分类中的应用。

J Healthc Eng. 2023 Nov 3;2023:3715603. doi: 10.1155/2023/3715603. eCollection 2023.

MF-MNER: Multi-models Fusion for MNER in Chinese Clinical Electronic Medical Records.MF-MNER：中文临床电子病历中的多模型融合命名实体识别。

Interdiscip Sci. 2024 Jun;16(2):489-502. doi: 10.1007/s12539-024-00624-z. Epub 2024 Apr 5.

The Role of ArtificiaI Intelligence in Brain Tumor Diagnosis: An Evaluation of a Machine Learning Model.人工智能在脑肿瘤诊断中的作用：对一种机器学习模型的评估。

Cureus. 2024 Jun 1;16(6):e61483. doi: 10.7759/cureus.61483. eCollection 2024 Jun.

Implementation and evaluation of a multivariate abstraction-based, interval-based dynamic time-warping method as a similarity measure for longitudinal medical records.基于多元抽象和区间的动态时间规整方法的实现和评估，作为一种用于纵向医疗记录的相似性度量方法。

J Biomed Inform. 2021 Nov;123:103919. doi: 10.1016/j.jbi.2021.103919. Epub 2021 Oct 8.

引用本文的文献

Extended sample size calculations for evaluation of prediction models using a threshold for classification.使用分类阈值评估预测模型的扩展样本量计算

BMC Med Res Methodol. 2025 Jul 1;25(1):170. doi: 10.1186/s12874-025-02592-4.

A comparative analysis of machine learning models and human expertise for nursing intervention classification.用于护理干预分类的机器学习模型与人类专业知识的比较分析。

JAMIA Open. 2025 Jun 27;8(3):ooaf057. doi: 10.1093/jamiaopen/ooaf057. eCollection 2025 Jun.

Automated Risser Grade Assessment of Pelvic Bones Using Deep Learning.使用深度学习对骨盆骨骼进行自动Risser分级评估

Bioengineering (Basel). 2025 May 29;12(6):589. doi: 10.3390/bioengineering12060589.

Progress in developing a bark beetle identification tool.一种树皮甲虫识别工具的开发进展。

PLoS One. 2025 Jun 5;20(6):e0310716. doi: 10.1371/journal.pone.0310716. eCollection 2025.

Social media crisis communication and public engagement during COVID-19 analyzing public health and news media organizations' tweeting strategies.新冠疫情期间的社交媒体危机沟通与公众参与：分析公共卫生和新闻媒体组织的推文策略

Sci Rep. 2025 May 24;15(1):18082. doi: 10.1038/s41598-025-90759-w.

Data- and knowledge-derived functional landscape of human solute carriers.人类溶质载体的数据与知识衍生功能图谱

Mol Syst Biol. 2025 May 12. doi: 10.1038/s44320-025-00108-2.

External validation of an RSNA 2023 Abdominal Trauma AI Challenge high performing machine learning model in the detection and grading of splenic injuries on CT.RSNA 2023腹部创伤人工智能挑战赛高性能机器学习模型在CT脾脏损伤检测和分级中的外部验证

Abdom Radiol (NY). 2025 May 2. doi: 10.1007/s00261-025-04910-2.

A machine learning model for predicting fertilization following short-term insemination using embryo images.一种使用胚胎图像预测短期授精后受精情况的机器学习模型。

Reprod Med Biol. 2025 Apr 15;24(1):e12649. doi: 10.1002/rmb2.12649. eCollection 2025 Jan-Dec.

A SMOTE PCA HDBSCAN approach for enhancing water quality classification in imbalanced datasets.一种用于增强不平衡数据集中水质分类的SMOTE主成分分析-高密度基于密度空间聚类方法。

Sci Rep. 2025 Apr 16;15(1):13059. doi: 10.1038/s41598-025-97248-0.

Multi-defect detection and classification for aluminum alloys with enhanced YOLOv8.基于增强型YOLOv8的铝合金多缺陷检测与分类

PLoS One. 2025 Mar 20;20(3):e0316817. doi: 10.1371/journal.pone.0316817. eCollection 2025.

本文引用的文献

COVID-19 classification by CCSHNet with deep fusion using transfer learning and discriminant correlation analysis.使用迁移学习和判别相关分析进行深度融合的CCSHNet对COVID-19的分类

Inf Fusion. 2021 Apr;68:131-148. doi: 10.1016/j.inffus.2020.11.005. Epub 2020 Nov 13.

Cyberattacks Detection in IoT-Based Smart City Applications Using Machine Learning Techniques.基于机器学习技术的物联网智慧城市应用中的网络攻击检测。

Int J Environ Res Public Health. 2020 Dec 14;17(24):9347. doi: 10.3390/ijerph17249347.

Expression based biomarkers and models to classify early and late-stage samples of Papillary Thyroid Carcinoma.基于表达谱的生物标志物和模型，用于分类甲状腺乳头状癌的早期和晚期样本。

PLoS One. 2020 Apr 23;15(4):e0231629. doi: 10.1371/journal.pone.0231629. eCollection 2020.

Detection of driver manual distraction via image-based hand and ear recognition.基于图像的手和耳识别检测驾驶员手动分心。

Accid Anal Prev. 2020 Mar;137:105432. doi: 10.1016/j.aap.2020.105432. Epub 2020 Jan 28.

Analyze Informant-Based Questionnaire for The Early Diagnosis of Senile Dementia Using Deep Learning.基于深度学习的用于老年痴呆症早期诊断的信息提供者问卷分析

IEEE J Transl Eng Health Med. 2019 Dec 16;8:2200106. doi: 10.1109/JTEHM.2019.2959331. eCollection 2020.

Developing a FHIR-based EHR phenotyping framework: A case study for identification of patients with obesity and multiple comorbidities from discharge summaries.基于 FHIR 的电子健康记录表型框架的开发：以从出院小结中识别肥胖且伴有多种合并症的患者为例。

J Biomed Inform. 2019 Nov;99:103310. doi: 10.1016/j.jbi.2019.103310. Epub 2019 Oct 14.

A multitask bi-directional RNN model for named entity recognition on Chinese electronic medical records.一种用于中文电子病历命名实体识别的多任务双向 RNN 模型。

BMC Bioinformatics. 2018 Dec 28;19(Suppl 17):499. doi: 10.1186/s12859-018-2467-9.

An Anomalous Noise Events Detector for Dynamic Road Traffic Noise Mapping in Real-Life Urban and Suburban Environments.一种用于现实城市和郊区环境中动态道路交通噪声映射的异常噪声事件检测器。

Sensors (Basel). 2017 Oct 12;17(10):2323. doi: 10.3390/s17102323.

Mixed Neural Network Approach for Temporal Sleep Stage Classification.混合神经网络方法进行时间睡眠阶段分类。

IEEE Trans Neural Syst Rehabil Eng. 2018 Feb;26(2):324-333. doi: 10.1109/TNSRE.2017.2733220. Epub 2017 Jul 28.

Biomedical event trigger detection by dependency-based word embedding.基于依存关系的词嵌入进行生物医学事件触发检测

BMC Med Genomics. 2016 Aug 10;9 Suppl 2(Suppl 2):45. doi: 10.1186/s12920-016-0203-8.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。