Al-Helali Baligh, Chen Qi, Xue Bing, Zhang Mengjie
IEEE Trans Cybern. 2024 Jul;54(7):4014-4027. doi: 10.1109/TCYB.2023.3270319. Epub 2024 Jul 11.
Data incompleteness is a serious challenge in real-world machine-learning tasks. Nevertheless, it has not received enough attention in symbolic regression (SR). Data missingness exacerbates data shortage, especially in domains with limited available data, which in turn limits the learning ability of SR algorithms. Transfer learning (TL), which aims to transfer knowledge across tasks, is a potential solution to solve this issue by making amends for the lack of knowledge. However, this approach has not been adequately investigated in SR. To fill this gap, a multitree genetic programming-based TL method is proposed in this work to transfer knowledge from complete source domains (SDs) to incomplete related target domains (TDs). The proposed method transforms the features from a complete SD to an incomplete TD. However, having many features complicates the transformation process. To mitigate this problem, we integrate a feature selection mechanism to eliminate unnecessary transformations. The method is examined on real-world and synthetic SR tasks with missing values to consider different learning scenarios. The obtained results not only show the effectiveness of the proposed method but also show its training efficiency compared with the existing TL methods. Compared to state-of-the-art methods, the proposed method reduced an average of more than 2.58% and 4% regression error on heterogeneous and homogeneous domains, respectively.
数据不完整是现实世界机器学习任务中的一个严峻挑战。然而,它在符号回归(SR)中尚未得到足够的关注。数据缺失加剧了数据短缺,尤其是在可用数据有限的领域,这反过来又限制了SR算法的学习能力。迁移学习(TL)旨在跨任务转移知识,是一种通过弥补知识不足来解决此问题的潜在解决方案。然而,这种方法在SR中尚未得到充分研究。为了填补这一空白,本文提出了一种基于多树遗传编程的TL方法,将知识从完整的源域(SD)转移到不完整的相关目标域(TD)。所提出的方法将特征从完整的SD转换到不完整的TD。然而,特征过多会使转换过程变得复杂。为了缓解这个问题,我们集成了一种特征选择机制来消除不必要的转换。该方法在具有缺失值的现实世界和合成SR任务上进行了检验,以考虑不同的学习场景。获得的结果不仅表明了所提出方法的有效性,还表明了其与现有TL方法相比的训练效率。与最先进的方法相比,所提出的方法在异构和同构域上分别平均降低了超过2.58%和4%的回归误差。