• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过统计增强学习提升任何学习算法。

Boosting any learning algorithm with Statistically Enhanced Learning.

作者信息

Felice Florian, Ley Christophe, Bordas Stéphane P A, Groll Andreas

机构信息

Department of Mathematics, University of Luxembourg, 4364, Esch-sur-Alzette, Luxembourg.

Department of Engineering, University of Luxembourg, 4364, Esch-sur-Alzette, Luxembourg.

出版信息

Sci Rep. 2025 Jan 10;15(1):1605. doi: 10.1038/s41598-024-84702-8.

DOI:10.1038/s41598-024-84702-8
PMID:39794482
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11724061/
Abstract

Feature engineering is of critical importance in the field of Data Science. While any data scientist knows the importance of rigorously preparing data to obtain good performing models, only scarce literature formalizes its benefits. In this work, we present the method of Statistically Enhanced Learning (SEL), a formalization framework of existing feature engineering and extraction tasks in Machine Learning (ML). Contrary to existing approaches, predictors are not directly observed but obtained as statistical estimators. Our goal is to study SEL, aiming to establish a formalized framework and illustrate its improved performance by means of simulations as well as applications on practical use cases.

摘要

特征工程在数据科学领域至关重要。虽然任何数据科学家都知道严格准备数据以获得性能良好的模型的重要性,但只有很少的文献将其好处形式化。在这项工作中,我们提出了统计增强学习(SEL)方法,这是机器学习(ML)中现有特征工程和提取任务的形式化框架。与现有方法相反,预测器不是直接观察到的,而是作为统计估计器获得的。我们的目标是研究SEL,旨在建立一个形式化框架,并通过模拟以及在实际用例中的应用来说明其改进的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4805/11724061/d7f963df6bb8/41598_2024_84702_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4805/11724061/0b2521927a22/41598_2024_84702_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4805/11724061/08bd4d58a05e/41598_2024_84702_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4805/11724061/dab9e3f8d13e/41598_2024_84702_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4805/11724061/cecf79f36b6f/41598_2024_84702_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4805/11724061/111491d6c7a1/41598_2024_84702_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4805/11724061/94c2b53d9187/41598_2024_84702_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4805/11724061/d7f963df6bb8/41598_2024_84702_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4805/11724061/0b2521927a22/41598_2024_84702_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4805/11724061/08bd4d58a05e/41598_2024_84702_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4805/11724061/dab9e3f8d13e/41598_2024_84702_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4805/11724061/cecf79f36b6f/41598_2024_84702_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4805/11724061/111491d6c7a1/41598_2024_84702_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4805/11724061/94c2b53d9187/41598_2024_84702_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4805/11724061/d7f963df6bb8/41598_2024_84702_Fig7_HTML.jpg

相似文献

1
Boosting any learning algorithm with Statistically Enhanced Learning.通过统计增强学习提升任何学习算法。
Sci Rep. 2025 Jan 10;15(1):1605. doi: 10.1038/s41598-024-84702-8.
2
Ontology-based feature engineering in machine learning workflows for heterogeneous epilepsy patient records.基于本体论的机器学习工作流中的特征工程,用于异构的癫痫患者记录。
Sci Rep. 2022 Nov 12;12(1):19430. doi: 10.1038/s41598-022-23101-3.
3
A data-guided approach for the evaluation of zeolites for hydrogen storage with the aid of molecular simulations.一种借助分子模拟对用于储氢的沸石进行评估的数据导向方法。
J Mol Model. 2024 Jan 18;30(2):43. doi: 10.1007/s00894-024-05837-z.
4
Analysis of the fatigue status of medical security personnel during the closed-loop period using multiple machine learning methods: a case study of the Beijing 2022 Olympic Winter Games.运用多种机器学习方法分析闭环期间医疗保障人员的疲劳状态:以 2022 年北京冬奥会为例。
Sci Rep. 2024 Apr 18;14(1):8987. doi: 10.1038/s41598-024-59397-6.
5
A machine learning-based framework to identify type 2 diabetes through electronic health records.一种基于机器学习的通过电子健康记录识别2型糖尿病的框架。
Int J Med Inform. 2017 Jan;97:120-127. doi: 10.1016/j.ijmedinf.2016.09.014. Epub 2016 Oct 1.
6
Risk score prediction model based on single nucleotide polymorphism for predicting malaria: a machine learning approach.基于单核苷酸多态性的疟疾风险评分预测模型:一种机器学习方法。
BMC Bioinformatics. 2022 Aug 7;23(1):325. doi: 10.1186/s12859-022-04870-0.
7
Predicting energy use in construction using Extreme Gradient Boosting.使用极端梯度提升法预测建筑能耗
PeerJ Comput Sci. 2023 Aug 7;9:e1500. doi: 10.7717/peerj-cs.1500. eCollection 2023.
8
An interpretable time series machine learning method for varying forecast and nowcast lengths in wastewater-based epidemiology.一种用于基于废水的流行病学中可变预测和临近预报长度的可解释时间序列机器学习方法。
MethodsX. 2023 Sep 27;11:102382. doi: 10.1016/j.mex.2023.102382. eCollection 2023 Dec.
9
A Machine Learning Framework for Automated Accident Detection Based on Multimodal Sensors in Cars.基于车载多模态传感器的自动事故检测机器学习框架。
Sensors (Basel). 2022 May 10;22(10):3634. doi: 10.3390/s22103634.
10
Ensemble machine learning model trained on a new synthesized dataset generalizes well for stress prediction using wearable devices.在新合成数据集上训练的集成机器学习模型,对于使用可穿戴设备进行压力预测具有良好的泛化能力。
J Biomed Inform. 2023 Dec;148:104556. doi: 10.1016/j.jbi.2023.104556. Epub 2023 Dec 2.

本文引用的文献

1
Monitoring Variables Influence on Random Forest Models to Forecast Injuries in Short-Track Speed Skating.监测变量对预测短道速滑损伤的随机森林模型的影响
Front Sports Act Living. 2022 Jul 14;4:896828. doi: 10.3389/fspor.2022.896828. eCollection 2022.
2
Machine learning and conventional statistics: making sense of the differences.机器学习和传统统计学:理解差异。
Knee Surg Sports Traumatol Arthrosc. 2022 Mar;30(3):753-757. doi: 10.1007/s00167-022-06896-6. Epub 2022 Feb 2.
3
From Local Explanations to Global Understanding with Explainable AI for Trees.
利用可解释人工智能实现从局部解释到树木的全局理解
Nat Mach Intell. 2020 Jan;2(1):56-67. doi: 10.1038/s42256-019-0138-9. Epub 2020 Jan 17.
4
Measuring living standards with proxy variables.使用代理变量衡量生活水平。
Demography. 2000 May;37(2):155-74.