Huang Zhixing, Mei Yi, Zhong Jinghui
IEEE Trans Cybern. 2024 Feb;54(2):1321-1334. doi: 10.1109/TCYB.2022.3181461. Epub 2024 Jan 17.
Symbolic regression (SR) is an important problem with many applications, such as automatic programming tasks and data mining. Genetic programming (GP) is a commonly used technique for SR. In the past decade, a branch of GP that utilizes the program behavior to guide the search, called semantic GP (SGP), has achieved great success in solving SR problems. However, existing SGP methods only focus on the tree-based chromosome representation and usually encounter the bloat issue and unsatisfactory generalization ability. To address these issues, we propose a new semantic linear GP (SLGP) algorithm. In SLGP, we design a new chromosome representation to encode the programs and semantic information in a linear fashion. To utilize the semantic information more effectively, we further propose a novel semantic genetic operator, namely, mutate-and-divide propagation, to recursively propagate the semantic error within the linear program. The empirical results show that the proposed method has better training and test errors than the state-of-the-art algorithms in solving SR problems and can achieve a much smaller program size.
符号回归(SR)是一个具有许多应用的重要问题,如自动编程任务和数据挖掘。遗传编程(GP)是一种常用于符号回归的技术。在过去十年中,遗传编程的一个分支,即利用程序行为来指导搜索的语义遗传编程(SGP),在解决符号回归问题上取得了巨大成功。然而,现有的语义遗传编程方法仅专注于基于树的染色体表示,并且通常会遇到膨胀问题和不理想的泛化能力。为了解决这些问题,我们提出了一种新的语义线性遗传编程(SLGP)算法。在语义线性遗传编程中,我们设计了一种新的染色体表示,以线性方式对程序和语义信息进行编码。为了更有效地利用语义信息,我们进一步提出了一种新颖的语义遗传算子,即变异-分割传播,以在线性程序中递归传播语义误差。实验结果表明,在解决符号回归问题时,所提出的方法比现有最先进算法具有更好的训练和测试误差,并且可以实现小得多的程序规模。