Turbulence Structure Laboratory, School of Mechanical Engineering, Tel Aviv University, Tel Aviv, Israel.
Department of Cancer Biology, Cancer Institute, University College London, London, UK.
Sci Rep. 2023 Jan 23;13(1):1249. doi: 10.1038/s41598-023-28328-2.
Discovering a meaningful symbolic expression that explains experimental data is a fundamental challenge in many scientific fields. We present a novel, open-source computational framework called Scientist-Machine Equation Detector (SciMED), which integrates scientific discipline wisdom in a scientist-in-the-loop approach, with state-of-the-art symbolic regression (SR) methods. SciMED combines a wrapper selection method, that is based on a genetic algorithm, with automatic machine learning and two levels of SR methods. We test SciMED on five configurations of a settling sphere, with and without aerodynamic non-linear drag force, and with excessive noise in the measurements. We show that SciMED is sufficiently robust to discover the correct physically meaningful symbolic expressions from the data, and demonstrate how the integration of domain knowledge enhances its performance. Our results indicate better performance on these tasks than the state-of-the-art SR software packages , even in cases where no knowledge is integrated. Moreover, we demonstrate how SciMED can alert the user about possible missing features, unlike the majority of current SR systems.
发现能够解释实验数据的有意义的符号表达式是许多科学领域的基本挑战。我们提出了一种新颖的开源计算框架,称为科学家-机器方程探测器(SciMED),它将科学学科的智慧集成到一个科学家在环的方法中,并结合了最先进的符号回归(SR)方法。SciMED 结合了基于遗传算法的包装器选择方法、自动机器学习和两级 SR 方法。我们在带有和不带有空气动力非线性阻力的五个沉降球配置以及测量中存在过度噪声的情况下测试了 SciMED。我们表明,SciMED 足够强大,可以从数据中发现正确的具有物理意义的符号表达式,并展示了如何集成领域知识来提高其性能。我们的结果表明,即使在没有集成知识的情况下,SciMED 在这些任务上的性能也优于最先进的 SR 软件包。此外,我们展示了 SciMED 如何能够提醒用户可能缺少的特征,而不像当前大多数 SR 系统那样。