de Franca F O, Virgolin M, Kommenda M, Majumder M S, Cranmer M, Espada G, Ingelse L, Fonseca A, Landajuela M, Petersen B, Glatt R, Mundhenk N, Lee C S, Hochhalter J D, Randall D L, Kamienny P, Zhang H, Dick G, Simon A, Burlacu B, Kasak Jaan, Machado Meera, Wilstrup Casper, La Cava W G
Center for Mathematics, Computation and Cognition (CMCC), Heuristics, Analysis and Learning Laboratory (HAL), Federal University of ABC, Santo Andre, Brazil.
Evolutionary Intelligence group, Centrum Wiskunde & Informatica, Science Park 123, Amsterdam, Netherlands.
IEEE Trans Evol Comput. 2025 Aug;29(4):1127-1134. doi: 10.1109/tevc.2024.3423681. Epub 2024 Jul 4.
Symbolic regression searches for analytic expressions that accurately describe studied phenomena. The main promise of this approach is that it may return an interpretable model that can be insightful to users, while maintaining high accuracy. The current standard for benchmarking these algorithms is SRBench, which evaluates methods on hundreds of datasets that are a mix of real-world and simulated processes spanning multiple domains. At present, the ability of SRBench to evaluate interpretability is limited to measuring the size of expressions on real-world data, and the exactness of model forms on synthetic data. In practice, model size is only one of many factors used by subject experts to determine how interpretable a model truly is. Furthermore, SRBench does not characterize algorithm performance on specific, challenging sub-tasks of regression such as feature selection and evasion of local minima. In this work, we propose and evaluate an approach to benchmarking SR algorithms that addresses these limitations of SRBench by 1) incorporating expert evaluations of interpretability on a domain-specific task, and 2) evaluating algorithms over distinct properties of data science tasks. We evaluate 12 modern symbolic regression algorithms on these benchmarks and present an in-depth analysis of the results, discuss current challenges of symbolic regression algorithms and highlight possible improvements for the benchmark itself.
符号回归旨在寻找能够准确描述所研究现象的解析表达式。这种方法的主要优势在于,它可能会返回一个对用户具有启发性的可解释模型,同时保持较高的准确性。目前用于对这些算法进行基准测试的标准是SRBench,它在数百个数据集上评估各种方法,这些数据集涵盖了多个领域的真实世界和模拟过程。目前,SRBench评估可解释性的能力仅限于测量真实世界数据上表达式的大小,以及合成数据上模型形式的准确性。在实践中,模型大小只是领域专家用来确定模型真正可解释程度的众多因素之一。此外,SRBench并未描述算法在回归的特定挑战性子任务(如特征选择和避免局部最小值)上的性能。在这项工作中,我们提出并评估了一种对符号回归算法进行基准测试的方法,该方法通过1)纳入针对特定领域任务的可解释性专家评估,以及2)针对数据科学任务的不同属性评估算法,来解决SRBench的这些局限性。我们在这些基准上评估了12种现代符号回归算法,并对结果进行了深入分析,讨论了符号回归算法当前面临的挑战,并强调了基准本身可能的改进之处。