Felice Florian, Ley Christophe, Bordas Stéphane P A, Groll Andreas
Department of Mathematics, University of Luxembourg, 4364, Esch-sur-Alzette, Luxembourg.
Department of Engineering, University of Luxembourg, 4364, Esch-sur-Alzette, Luxembourg.
Sci Rep. 2025 Jan 10;15(1):1605. doi: 10.1038/s41598-024-84702-8.
Feature engineering is of critical importance in the field of Data Science. While any data scientist knows the importance of rigorously preparing data to obtain good performing models, only scarce literature formalizes its benefits. In this work, we present the method of Statistically Enhanced Learning (SEL), a formalization framework of existing feature engineering and extraction tasks in Machine Learning (ML). Contrary to existing approaches, predictors are not directly observed but obtained as statistical estimators. Our goal is to study SEL, aiming to establish a formalized framework and illustrate its improved performance by means of simulations as well as applications on practical use cases.
特征工程在数据科学领域至关重要。虽然任何数据科学家都知道严格准备数据以获得性能良好的模型的重要性,但只有很少的文献将其好处形式化。在这项工作中,我们提出了统计增强学习(SEL)方法,这是机器学习(ML)中现有特征工程和提取任务的形式化框架。与现有方法相反,预测器不是直接观察到的,而是作为统计估计器获得的。我们的目标是研究SEL,旨在建立一个形式化框架,并通过模拟以及在实际用例中的应用来说明其改进的性能。