IEEE Trans Cybern. 2020 Feb;50(2):425-439. doi: 10.1109/TCYB.2018.2859342. Epub 2018 Aug 16.
Although feature selection for large data has been intensively investigated in data mining, machine learning, and pattern recognition, the challenges are not just to invent new algorithms to handle noisy and uncertain large data in applications, but rather to link the multiple relevant feature sources, structured, or unstructured, to develop an effective feature reduction method. In this paper, we propose a multiple relevant feature ensemble selection (MRFES) algorithm based on multilayer co-evolutionary consensus MapReduce (MCCM). We construct an effective MCCM model to handle feature ensemble selection of large-scale datasets with multiple relevant feature sources, and explore the unified consistency aggregation between the local solutions and global dominance solutions achieved by the co-evolutionary memeplexes, which participate in the cooperative feature ensemble selection process. This model attempts to reach a mutual decision agreement among co-evolutionary memeplexes, which calls for the need for mechanisms to detect some noncooperative co-evolutionary behaviors and achieve better Nash equilibrium resolutions. Extensive experimental comparative studies substantiate the effectiveness of MRFES to solve large-scale dataset problems with the complex noise and multiple relevant feature sources on some well-known benchmark datasets. The algorithm can greatly facilitate the selection of relevant feature subsets coming from the original feature space with better accuracy, efficiency, and interpretability. Moreover, we apply MRFES to human cerebral cortex-based classification prediction. Such successful applications are expected to significantly scale up classification prediction for large-scale and complex brain data in terms of efficiency and feasibility.
虽然在数据挖掘、机器学习和模式识别中已经对大数据的特征选择进行了深入研究,但面临的挑战不仅仅是发明新的算法来处理应用中嘈杂和不确定的大数据,而是要链接多个相关的特征源,无论是结构化的还是非结构化的,以开发有效的特征降维方法。在本文中,我们提出了一种基于多层协同进化共识 MapReduce(MCCM)的多相关特征集成选择(MRFES)算法。我们构建了一个有效的 MCCM 模型,用于处理具有多个相关特征源的大规模数据集的特征集成选择,并探索了协同 memeplexes 之间实现的局部解和全局优势解之间的统一一致性聚合,它们参与了协同特征集成选择过程。该模型试图在协同 memeplexes 之间达成相互决策协议,这需要机制来检测一些非合作的协同进化行为,并实现更好的纳什均衡解决方案。广泛的实验比较研究证实了 MRFES 对于解决具有复杂噪声和多个相关特征源的大规模数据集问题的有效性,在一些著名的基准数据集上。该算法可以极大地促进从原始特征空间中选择具有更好准确性、效率和可解释性的相关特征子集。此外,我们将 MRFES 应用于基于人类大脑皮层的分类预测。这种成功的应用有望在效率和可行性方面大大提高大规模和复杂脑数据的分类预测能力。