Kwon Hyunjin, Park Jinhyeok, Lee Youngho
Department of IT Convergence Engineering, Gachon University, Seongnam, Korea.
Department of Computer Engineering, Gachon University, Seongnam, Korea.
Healthc Inform Res. 2019 Oct;25(4):283-288. doi: 10.4258/hir.2019.25.4.283. Epub 2019 Oct 31.
Breast cancer is the second most common cancer among Korean women. Because breast cancer is strongly associated with negative emotional and physical changes, early detection and treatment of breast cancer are very important. As a supporting tool for classifying breast cancer, we tried to identify the best meta-learner model in a stacking ensemble when the same machine learning models for the base learner and meta-learner are used.
We used machine learning models, such as the gradient boosted model, distributed random forest, generalized linear model, and deep neural network in a stacking ensemble. These models were used to construct a base learner, and each of them was used as a meta-learner again. Then, we compared the performance of machine learning models in the meta-learner to determine the best meta-learner model in the stacking ensemble.
Experimental results showed that using the GBM as a meta-learner led to higher accuracy than that achieved with any other model for breast cancer data and using the GLM as a meta learner led to low root-mean-squared error for both sets of breast cancer data.
We compared the performance of every meta-learner model in a stacking ensemble as a supporting tool for classifying breast cancer. The study showed that using specific models as a metalearner resulted in better performance than single classifiers, and using GBM and GLM as a meta-learner is appropriate as a supporting tool for classifying breast cancer data.
乳腺癌是韩国女性中第二常见的癌症。由于乳腺癌与负面情绪和身体变化密切相关,乳腺癌的早期检测和治疗非常重要。作为乳腺癌分类的辅助工具,当基础学习器和元学习器使用相同的机器学习模型时,我们试图在堆叠集成中识别最佳的元学习器模型。
我们在堆叠集成中使用了机器学习模型,如梯度提升模型、分布式随机森林、广义线性模型和深度神经网络。这些模型用于构建基础学习器,并且每个模型再次用作元学习器。然后,我们比较了元学习器中机器学习模型的性能,以确定堆叠集成中的最佳元学习器模型。
实验结果表明,将梯度提升模型用作元学习器时,对于乳腺癌数据,其准确率高于使用任何其他模型;将广义线性模型用作元学习器时,两组乳腺癌数据的均方根误差都较低。
作为乳腺癌分类的辅助工具,我们比较了堆叠集成中每个元学习器模型的性能。研究表明,使用特定模型作为元学习器比单分类器具有更好的性能,并且将梯度提升模型和广义线性模型用作元学习器作为乳腺癌数据分类的辅助工具是合适的。