• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用机器学习测量数据集中单个病例的预测难度。

Measuring the prediction difficulty of individual cases in a dataset using machine learning.

作者信息

Kwon Hyunjin, Greenberg Matthew, Josephson Colin Bruce, Lee Joon

机构信息

Department of Biomedical Engineering, Schulich School of Engineering, University of Calgary, Calgary, Alberta, Canada.

Data Intelligence for Health Lab, Cumming School of Medicine, University of Calgary, Alberta, Canada.

出版信息

Sci Rep. 2024 May 7;14(1):10474. doi: 10.1038/s41598-024-61284-z.

DOI:10.1038/s41598-024-61284-z
PMID:38714895
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11076552/
Abstract

Different levels of prediction difficulty are one of the key factors that researchers encounter when applying machine learning to data. Although previous studies have introduced various metrics for assessing the prediction difficulty of individual cases, these metrics require specific dataset preconditions. In this paper, we propose three novel metrics for measuring the prediction difficulty of individual cases using fully-connected feedforward neural networks. The first metric is based on the complexity of the neural network needed to make a correct prediction. The second metric employs a pair of neural networks: one makes a prediction for a given case, and the other predicts whether the prediction made by the first model is likely to be correct. The third metric assesses the variability of the neural network's predictions. We investigated these metrics using a variety of datasets, visualized their values, and compared them to fifteen existing metrics from the literature. The results demonstrate that the proposed case difficulty metrics were better able to differentiate various levels of difficulty than most of the existing metrics and show constant effectiveness across diverse datasets. We expect our metrics will provide researchers with a new perspective on understanding their datasets and applying machine learning in various fields.

摘要

不同程度的预测难度是研究人员在将机器学习应用于数据时遇到的关键因素之一。尽管先前的研究已经引入了各种指标来评估单个案例的预测难度,但这些指标需要特定的数据集前提条件。在本文中,我们提出了三种新颖的指标,用于使用全连接前馈神经网络来衡量单个案例的预测难度。第一个指标基于做出正确预测所需的神经网络的复杂性。第二个指标采用一对神经网络:一个对给定案例进行预测,另一个预测第一个模型做出的预测是否可能正确。第三个指标评估神经网络预测的可变性。我们使用各种数据集对这些指标进行了研究,直观显示了它们的值,并将它们与文献中的十五个现有指标进行了比较。结果表明,所提出的案例难度指标比大多数现有指标更能区分不同程度的难度,并且在不同的数据集中都表现出持续的有效性。我们期望我们的指标将为研究人员提供一个新的视角,以理解他们的数据集并在各个领域应用机器学习。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de98/11076552/ecf180e75414/41598_2024_61284_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de98/11076552/e8799531c096/41598_2024_61284_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de98/11076552/7a76bac223f8/41598_2024_61284_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de98/11076552/9ea6dcef1118/41598_2024_61284_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de98/11076552/4bb1bad5803b/41598_2024_61284_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de98/11076552/71e948583915/41598_2024_61284_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de98/11076552/cb707e4e0a59/41598_2024_61284_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de98/11076552/b661e42a536a/41598_2024_61284_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de98/11076552/ecf180e75414/41598_2024_61284_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de98/11076552/e8799531c096/41598_2024_61284_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de98/11076552/7a76bac223f8/41598_2024_61284_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de98/11076552/9ea6dcef1118/41598_2024_61284_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de98/11076552/4bb1bad5803b/41598_2024_61284_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de98/11076552/71e948583915/41598_2024_61284_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de98/11076552/cb707e4e0a59/41598_2024_61284_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de98/11076552/b661e42a536a/41598_2024_61284_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de98/11076552/ecf180e75414/41598_2024_61284_Fig8_HTML.jpg

相似文献

1
Measuring the prediction difficulty of individual cases in a dataset using machine learning.使用机器学习测量数据集中单个病例的预测难度。
Sci Rep. 2024 May 7;14(1):10474. doi: 10.1038/s41598-024-61284-z.
2
Ensemble machine learning model trained on a new synthesized dataset generalizes well for stress prediction using wearable devices.在新合成数据集上训练的集成机器学习模型,对于使用可穿戴设备进行压力预测具有良好的泛化能力。
J Biomed Inform. 2023 Dec;148:104556. doi: 10.1016/j.jbi.2023.104556. Epub 2023 Dec 2.
3
Comparative Study of Machine-Learning Frameworks for the Elaboration of Feed-Forward Neural Networks by Varying the Complexity of Impedimetric Datasets Synthesized Using Eddy Current Sensors for the Characterization of Bi-Metallic Coins.基于电涡流传感器合成的阻抗数据集复杂度变化的前馈神经网络的机器学习框架的比较研究,用于双金属硬币的特征化。
Sensors (Basel). 2022 Feb 9;22(4):1312. doi: 10.3390/s22041312.
4
Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.机器学习算法在(放化疗)治疗结果预测中的应用:分类器的实证比较。
Med Phys. 2018 Jul;45(7):3449-3459. doi: 10.1002/mp.12967. Epub 2018 Jun 13.
5
Effects of dataset size and interactions on the prediction performance of logistic regression and deep learning models.数据集大小和交互作用对逻辑回归和深度学习模型预测性能的影响。
Comput Methods Programs Biomed. 2022 Jan;213:106504. doi: 10.1016/j.cmpb.2021.106504. Epub 2021 Oct 28.
6
Enabling data-limited chemical bioactivity predictions through deep neural network transfer learning.通过深度神经网络迁移学习实现数据受限的化学生物活性预测。
J Comput Aided Mol Des. 2022 Dec;36(12):867-878. doi: 10.1007/s10822-022-00486-x. Epub 2022 Oct 22.
7
Using Item Response Theory for Explainable Machine Learning in Predicting Mortality in the Intensive Care Unit: Case-Based Approach.应用项目反应理论进行可解释机器学习预测重症监护病房死亡率:基于案例的方法。
J Med Internet Res. 2020 Sep 25;22(9):e20268. doi: 10.2196/20268.
8
Using a deep convolutional network to predict the longitudinal dispersion coefficient.利用深度卷积网络预测纵向弥散系数。
J Contam Hydrol. 2021 Jun;240:103798. doi: 10.1016/j.jconhyd.2021.103798. Epub 2021 Mar 19.
9
Effects of noise on mental performance and annoyance considering task difficulty level and tone components of noise.考虑任务难度水平和噪声音调成分时,噪声对心理表现和烦恼的影响。
J Environ Health Sci Eng. 2019 Apr 16;17(1):353-365. doi: 10.1007/s40201-019-00353-2. eCollection 2019 Jun.
10
Network Assessor: an automated method for quantitative assessment of a network's potential for gene function prediction.网络评估器:一种用于定量评估网络在基因功能预测方面潜力的自动化方法。
Front Genet. 2014 May 16;5:123. doi: 10.3389/fgene.2014.00123. eCollection 2014.

本文引用的文献

1
Relating instance hardness to classification performance in a dataset: a visual approach.将数据集中的实例硬度与分类性能相关联:一种可视化方法。
Mach Learn. 2022;111(8):3085-3123. doi: 10.1007/s10994-022-06205-9. Epub 2022 Jun 22.
2
Machine Learning: Algorithms, Real-World Applications and Research Directions.机器学习:算法、实际应用与研究方向。
SN Comput Sci. 2021;2(3):160. doi: 10.1007/s42979-021-00592-x. Epub 2021 Mar 22.
3
Second opinion needed: communicating uncertainty in medical machine learning.需要第二种观点:传达医学机器学习中的不确定性
NPJ Digit Med. 2021 Jan 5;4(1):4. doi: 10.1038/s41746-020-00367-3.
4
Artificial intelligence from A to Z: From neural network to legal framework.人工智能面面观:从神经网络到法律框架。
Eur J Radiol. 2020 Aug;129:109083. doi: 10.1016/j.ejrad.2020.109083. Epub 2020 May 29.
5
Multisurface method of pattern separation for medical diagnosis applied to breast cytology.用于医学诊断的模式分离多表面方法应用于乳腺细胞学
Proc Natl Acad Sci U S A. 1990 Dec;87(23):9193-6. doi: 10.1073/pnas.87.23.9193.