Suppr超能文献

微观平均和宏观平均分数的置信区间。

Confidence interval for micro-averaged and macro-averaged scores.

作者信息

Takahashi Kanae, Yamamoto Kouji, Kuchiba Aya, Koyama Tatsuki

机构信息

Department of Medical Statistics, Osaka City University Graduate School of Medicine, Osaka, Japan.

Department of Biostatistics, Hyogo College of Medicine, Hyogo, Japan.

出版信息

Appl Intell (Dordr). 2022 Mar;52(5):4961-4972. doi: 10.1007/s10489-021-02635-5. Epub 2021 Jul 31.

Abstract

A binary classification problem is common in medical field, and we often use sensitivity, specificity, accuracy, negative and positive predictive values as measures of performance of a binary predictor. In computer science, a classifier is usually evaluated with precision (positive predictive value) and recall (sensitivity). As a single summary measure of a classifier's performance, score, defined as the harmonic mean of precision and recall, is widely used in the context of information retrieval and information extraction evaluation since it possesses favorable characteristics, especially when the prevalence is low. Some statistical methods for inference have been developed for the score in binary classification problems; however, they have not been extended to the problem of multi-class classification. There are three types of scores, and statistical properties of these scores have hardly ever been discussed. We propose methods based on the large sample multivariate central limit theorem for estimating scores with confidence intervals.

摘要

二元分类问题在医学领域很常见,我们经常使用灵敏度、特异度、准确度、阴性预测值和阳性预测值作为二元预测器性能的度量指标。在计算机科学中,分类器通常用精确率(阳性预测值)和召回率(灵敏度)来评估。作为分类器性能的单一汇总度量指标,F1分数定义为精确率和召回率的调和平均数,由于它具有良好的特性,特别是在患病率较低的情况下,因此在信息检索和信息提取评估中被广泛使用。针对二元分类问题中的F1分数,已经开发了一些用于推断的统计方法;然而,它们尚未扩展到多类分类问题。有三种类型的F1分数,而这些F1分数的统计特性几乎从未被讨论过。我们提出了基于大样本多元中心极限定理的方法,用于估计带有置信区间的F1分数。

相似文献

1
Confidence interval for micro-averaged and macro-averaged scores.
Appl Intell (Dordr). 2022 Mar;52(5):4961-4972. doi: 10.1007/s10489-021-02635-5. Epub 2021 Jul 31.
2
Hypothesis testing procedure for binary and multi-class F -scores in the paired design.
Stat Med. 2023 Oct 15;42(23):4177-4192. doi: 10.1002/sim.9853. Epub 2023 Aug 1.
3
Optimal Thresholding of Classifiers to Maximize F1 Measure.
Mach Learn Knowl Discov Databases. 2014;8725:225-239. doi: 10.1007/978-3-662-44851-9_15.
4
Formal definition of the MARS method for quantifying the unique target class discoveries of selected machine classifiers.
F1000Res. 2022 Apr 4;11:391. doi: 10.12688/f1000research.110567.2. eCollection 2022.
5
Accurate determination of imaging modality using an ensemble of text- and image-based classifiers.
J Digit Imaging. 2012 Feb;25(1):37-42. doi: 10.1007/s10278-011-9399-5.
6
Automating the assignment of diagnosis codes to patient encounters using example-based and machine learning techniques.
J Am Med Inform Assoc. 2006 Sep-Oct;13(5):516-25. doi: 10.1197/jamia.M2077. Epub 2006 Jun 23.
8
MF-MNER: Multi-models Fusion for MNER in Chinese Clinical Electronic Medical Records.
Interdiscip Sci. 2024 Jun;16(2):489-502. doi: 10.1007/s12539-024-00624-z. Epub 2024 Apr 5.
9
The Role of ArtificiaI Intelligence in Brain Tumor Diagnosis: An Evaluation of a Machine Learning Model.
Cureus. 2024 Jun 1;16(6):e61483. doi: 10.7759/cureus.61483. eCollection 2024 Jun.

引用本文的文献

1
Extended sample size calculations for evaluation of prediction models using a threshold for classification.
BMC Med Res Methodol. 2025 Jul 1;25(1):170. doi: 10.1186/s12874-025-02592-4.
2
A comparative analysis of machine learning models and human expertise for nursing intervention classification.
JAMIA Open. 2025 Jun 27;8(3):ooaf057. doi: 10.1093/jamiaopen/ooaf057. eCollection 2025 Jun.
3
Automated Risser Grade Assessment of Pelvic Bones Using Deep Learning.
Bioengineering (Basel). 2025 May 29;12(6):589. doi: 10.3390/bioengineering12060589.
4
Progress in developing a bark beetle identification tool.
PLoS One. 2025 Jun 5;20(6):e0310716. doi: 10.1371/journal.pone.0310716. eCollection 2025.
6
Data- and knowledge-derived functional landscape of human solute carriers.
Mol Syst Biol. 2025 May 12. doi: 10.1038/s44320-025-00108-2.
8
A machine learning model for predicting fertilization following short-term insemination using embryo images.
Reprod Med Biol. 2025 Apr 15;24(1):e12649. doi: 10.1002/rmb2.12649. eCollection 2025 Jan-Dec.
9
A SMOTE PCA HDBSCAN approach for enhancing water quality classification in imbalanced datasets.
Sci Rep. 2025 Apr 16;15(1):13059. doi: 10.1038/s41598-025-97248-0.
10
Multi-defect detection and classification for aluminum alloys with enhanced YOLOv8.
PLoS One. 2025 Mar 20;20(3):e0316817. doi: 10.1371/journal.pone.0316817. eCollection 2025.

本文引用的文献

1
COVID-19 classification by CCSHNet with deep fusion using transfer learning and discriminant correlation analysis.
Inf Fusion. 2021 Apr;68:131-148. doi: 10.1016/j.inffus.2020.11.005. Epub 2020 Nov 13.
2
Cyberattacks Detection in IoT-Based Smart City Applications Using Machine Learning Techniques.
Int J Environ Res Public Health. 2020 Dec 14;17(24):9347. doi: 10.3390/ijerph17249347.
3
Expression based biomarkers and models to classify early and late-stage samples of Papillary Thyroid Carcinoma.
PLoS One. 2020 Apr 23;15(4):e0231629. doi: 10.1371/journal.pone.0231629. eCollection 2020.
4
Detection of driver manual distraction via image-based hand and ear recognition.
Accid Anal Prev. 2020 Mar;137:105432. doi: 10.1016/j.aap.2020.105432. Epub 2020 Jan 28.
5
Analyze Informant-Based Questionnaire for The Early Diagnosis of Senile Dementia Using Deep Learning.
IEEE J Transl Eng Health Med. 2019 Dec 16;8:2200106. doi: 10.1109/JTEHM.2019.2959331. eCollection 2020.
7
A multitask bi-directional RNN model for named entity recognition on Chinese electronic medical records.
BMC Bioinformatics. 2018 Dec 28;19(Suppl 17):499. doi: 10.1186/s12859-018-2467-9.
9
Mixed Neural Network Approach for Temporal Sleep Stage Classification.
IEEE Trans Neural Syst Rehabil Eng. 2018 Feb;26(2):324-333. doi: 10.1109/TNSRE.2017.2733220. Epub 2017 Jul 28.
10
Biomedical event trigger detection by dependency-based word embedding.
BMC Med Genomics. 2016 Aug 10;9 Suppl 2(Suppl 2):45. doi: 10.1186/s12920-016-0203-8.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验