• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

帕累托分布数据降维的另一种方法:一个案例研究。

An alternative approach to dimension reduction for pareto distributed data: a case study.

作者信息

Roccetti Marco, Delnevo Giovanni, Casini Luca, Mirri Silvia

机构信息

Department of Computer Science and Engineering, University of Bologna, Via Mura Anteo Zamboni 7, 40127 Bologna, Italy.

出版信息

J Big Data. 2021;8(1):39. doi: 10.1186/s40537-021-00428-8. Epub 2021 Feb 25.

DOI:10.1186/s40537-021-00428-8
PMID:33649714
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7905765/
Abstract

models are tools for data analysis suitable for approximating (non-linear) relationships among variables for the best prediction of an outcome. While these models can be used to answer many important questions, their utility is still harshly criticized, being extremely challenging to identify which data are the most adequate to represent a given specific phenomenon of interest. With a recent experience in the development of a deep learning model designed to detect failures in mechanical water meter devices, we have learnt that a sensible deterioration of the prediction accuracy can occur if one tries to train a deep learning model by adding specific device descriptors, based on data. This can happen because of an excessive increase in the dimensions of the data, with a correspondent loss of statistical significance. After several unsuccessful experiments conducted with alternative methodologies that either permit to reduce the data space dimensionality or employ more traditional machine learning algorithms, we changed the training strategy, reconsidering that categorical data, in the light of a . In essence, we used those categorical descriptors, not as an input on which to train our deep learning model, but as a tool to give a new shape to the dataset, based on the rule. With this data adjustment, we trained a more performative deep learning model able to detect defective water meter devices with a prediction accuracy in the range 87-90%, even in the presence of categorical descriptors.

摘要

模型是用于数据分析的工具,适用于逼近变量之间的(非线性)关系,以最佳地预测结果。虽然这些模型可用于回答许多重要问题,但其效用仍受到严厉批评,因为要确定哪些数据最足以代表给定的特定感兴趣现象极具挑战性。通过最近开发一个旨在检测机械水表装置故障的深度学习模型的经验,我们了解到,如果基于数据通过添加特定设备描述符来训练深度学习模型,预测准确性可能会明显下降。这可能是因为数据维度过度增加,相应地失去了统计显著性。在用允许降低数据空间维度或采用更传统机器学习算法的替代方法进行了几次不成功的实验后,我们改变了训练策略,根据一种……重新考虑分类数据。本质上,我们不是将那些分类描述符用作训练深度学习模型的输入,而是将其作为一种工具,根据……规则为数据集赋予新的形式。通过这种数据调整,我们训练了一个性能更好的深度学习模型,即使存在分类描述符,该模型也能够以87 - 90%的预测准确率检测出有缺陷的水表装置。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f12/7905765/e5eecfdbe356/40537_2021_428_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f12/7905765/8626820d3e30/40537_2021_428_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f12/7905765/d78943630836/40537_2021_428_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f12/7905765/211849891b3f/40537_2021_428_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f12/7905765/e5eecfdbe356/40537_2021_428_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f12/7905765/8626820d3e30/40537_2021_428_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f12/7905765/d78943630836/40537_2021_428_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f12/7905765/211849891b3f/40537_2021_428_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f12/7905765/e5eecfdbe356/40537_2021_428_Fig4_HTML.jpg

相似文献

1
An alternative approach to dimension reduction for pareto distributed data: a case study.帕累托分布数据降维的另一种方法:一个案例研究。
J Big Data. 2021;8(1):39. doi: 10.1186/s40537-021-00428-8. Epub 2021 Feb 25.
2
Optimizing neural networks for medical data sets: A case study on neonatal apnea prediction.优化神经网络在医学数据集上的应用:以新生儿呼吸暂停预测为例的研究
Artif Intell Med. 2019 Jul;98:59-76. doi: 10.1016/j.artmed.2019.07.008. Epub 2019 Jul 25.
3
Deep Learning Methods for Predicting Disease Status Using Genomic Data.使用基因组数据预测疾病状态的深度学习方法
J Biom Biostat. 2018;9(5). Epub 2018 Dec 11.
4
Machine learning algorithms, bull genetic information, and imbalanced datasets used in abortion incidence prediction models for Iranian Holstein dairy cattle.机器学习算法、公牛遗传信息和不平衡数据集用于伊朗荷斯坦奶牛流产发生率预测模型。
Prev Vet Med. 2020 Feb;175:104869. doi: 10.1016/j.prevetmed.2019.104869. Epub 2019 Dec 17.
5
A novel end-to-end classifier using domain transferred deep convolutional neural networks for biomedical images.一种使用域转移深度卷积神经网络的新型端到端生物医学图像分类器。
Comput Methods Programs Biomed. 2017 Mar;140:283-293. doi: 10.1016/j.cmpb.2016.12.019. Epub 2017 Jan 6.
6
Evaluation of deep and shallow learning methods in chemogenomics for the prediction of drugs specificity.化学基因组学中用于预测药物特异性的深度学习和浅度学习方法评估。
J Cheminform. 2020 Feb 10;12(1):11. doi: 10.1186/s13321-020-0413-0.
7
Machine learning models based on the dimensionality reduction of standard automated perimetry data for glaucoma diagnosis.基于标准自动视野计数据降维的青光眼诊断机器学习模型。
Artif Intell Med. 2019 Mar;94:110-116. doi: 10.1016/j.artmed.2019.02.006. Epub 2019 Feb 25.
8
Evaluating shallow and deep learning strategies for the 2018 n2c2 shared task on clinical text classification.评估浅层和深度学习策略在 2018 n2c2 临床文本分类共享任务中的应用。
J Am Med Inform Assoc. 2019 Nov 1;26(11):1247-1254. doi: 10.1093/jamia/ocz149.
9
Development of Supervised Learning Predictive Models for Highly Non-linear Biological, Biomedical, and General Datasets.针对高度非线性的生物学、生物医学及通用数据集的监督学习预测模型的开发。
Front Mol Biosci. 2020 Feb 13;7:13. doi: 10.3389/fmolb.2020.00013. eCollection 2020.
10
Brain tumor classification for MR images using transfer learning and fine-tuning.基于迁移学习和微调的磁共振图像脑肿瘤分类。
Comput Med Imaging Graph. 2019 Jul;75:34-46. doi: 10.1016/j.compmedimag.2019.05.001. Epub 2019 May 18.

引用本文的文献

1
Fault Detection in Induction Machines Using Learning Models and Fourier Spectrum Image Analysis.基于学习模型和傅里叶频谱图像分析的感应电机故障检测
Sensors (Basel). 2025 Jan 15;25(2):471. doi: 10.3390/s25020471.
2
Identifying First-Trimester Risk Factors for SGA-LGA Using Weighted Inheritance Voting Ensemble Learning.使用加权遗传投票集成学习法识别小于胎龄儿-大于胎龄儿的孕早期风险因素。
Bioengineering (Basel). 2024 Jun 27;11(7):657. doi: 10.3390/bioengineering11070657.
3
D-MAINS: A Deep-Learning Model for the Label-Free Detection of Mitosis, Apoptosis, Interphase, Necrosis, and Senescence in Cancer Cells.

本文引用的文献

1
Characterizing superspreading events and age-specific infectiousness of SARS-CoV-2 transmission in Georgia, USA.描述美国佐治亚州 SARS-CoV-2 传播的超级传播事件和特定年龄段的传染性。
Proc Natl Acad Sci U S A. 2020 Sep 8;117(36):22430-22435. doi: 10.1073/pnas.2011802117. Epub 2020 Aug 20.
2
Cross-Modal Metric Learning for AUC Optimization.用于AUC优化的跨模态度量学习
IEEE Trans Neural Netw Learn Syst. 2018 Oct;29(10):4844-4856. doi: 10.1109/TNNLS.2017.2769128. Epub 2018 Jan 4.
3
Extensions to Online Feature Selection Using Bagging and Boosting.
D-MAINS:一种用于癌症细胞无标记检测有丝分裂、细胞凋亡、间期、坏死和衰老的深度学习模型。
Cells. 2024 Jun 8;13(12):1004. doi: 10.3390/cells13121004.
4
Environmental resilience through artificial intelligence: innovations in monitoring and management.通过人工智能实现环境恢复力:监测与管理方面的创新
Environ Sci Pollut Res Int. 2024 Mar;31(12):18379-18395. doi: 10.1007/s11356-024-32404-z. Epub 2024 Feb 15.
5
A Deep Learning Approach for Predicting Multiple Sclerosis.一种用于预测多发性硬化症的深度学习方法。
Micromachines (Basel). 2023 Mar 29;14(4):749. doi: 10.3390/mi14040749.
6
Classification of Pulmonary Damage Stages Caused by COVID-19 Disease from CT Scans via Transfer Learning.基于迁移学习通过CT扫描对新型冠状病毒肺炎所致肺损伤阶段进行分类
Bioengineering (Basel). 2022 Dec 20;10(1):6. doi: 10.3390/bioengineering10010006.
7
Identification of Neurodegenerative Diseases Based on Vertical Ground Reaction Force Classification Using Time-Frequency Spectrogram and Deep Learning Neural Network Features.基于时频频谱图和深度学习神经网络特征的垂直地面反作用力分类识别神经退行性疾病
Brain Sci. 2021 Jul 8;11(7):902. doi: 10.3390/brainsci11070902.
8
Investigating the impact of pre-processing techniques and pre-trained word embeddings in detecting Arabic health information on social media.研究预处理技术和预训练词嵌入在社交媒体上检测阿拉伯语健康信息方面的影响。
J Big Data. 2021;8(1):95. doi: 10.1186/s40537-021-00488-w. Epub 2021 Jul 2.
使用装袋法和提升法对在线特征选择的扩展。
IEEE Trans Neural Netw Learn Syst. 2018 Sep;29(9):4504-4509. doi: 10.1109/TNNLS.2017.2746107. Epub 2017 Oct 11.
4
A problem of dimensionality: a simple example.维度问题:一个简单的例子。
IEEE Trans Pattern Anal Mach Intell. 1979 Mar;1(3):306-7. doi: 10.1109/tpami.1979.4766926.
5
Alignment of overlapping locally scaled patches for multidimensional scaling and dimensionality reduction.用于多维缩放和降维的重叠局部缩放补丁对齐
IEEE Trans Pattern Anal Mach Intell. 2008 Mar;30(3):438-50. doi: 10.1109/TPAMI.2007.70706.
6
Biometrics from brain electrical activity: a machine learning approach.基于脑电活动的生物识别技术:一种机器学习方法。
IEEE Trans Pattern Anal Mach Intell. 2007 Apr;29(4):738-42. doi: 10.1109/TPAMI.2007.1013.
7
Superspreading and the effect of individual variation on disease emergence.超级传播以及个体差异对疾病出现的影响。
Nature. 2005 Nov 17;438(7066):355-9. doi: 10.1038/nature04153.
8
Long short-term memory.长短期记忆
Neural Comput. 1997 Nov 15;9(8):1735-80. doi: 10.1162/neco.1997.9.8.1735.
9
Combined statistical study of joint angles and ground reaction forces using component and multiple correspondence analysis.使用成分分析和多重对应分析对关节角度和地面反作用力进行联合统计研究。
IEEE Trans Biomed Eng. 1994 Dec;41(12):1160-7. doi: 10.1109/10.335864.