Bian Kai, Zhou Mengran, Hu Feng, Lai Wenhao
School of Electrical and Information Engineering, Anhui University of Science and Technology, Huainan, China.
State Key Laboratory of Mining Response and Disaster Prevention and Control in Deep Coal Mines, Anhui University of Science and Technology, Huainan, China.
Front Genet. 2020 Sep 9;11:566057. doi: 10.3389/fgene.2020.566057. eCollection 2020.
Breast cancer is one of the most common cancer diseases in women. The rapid and accurate diagnosis of breast cancer is of great significance for the treatment of cancer. Artificial intelligence and machine learning algorithms are used to identify breast malignant tumors, which can effectively solve the problems of insufficient recognition accuracy and long time-consuming in traditional breast cancer diagnosis methods. To solve these problems, we proposed a method of attribute selection and feature extraction based on random forest (RF) combined with principal component analysis (PCA) for rapid and accurate diagnosis of breast cancer. Firstly, RF was used to reduce 30 attributes of breast cancer categorical data. According to the average importance of attributes and out of bag error, 21 relatively important attribute data were selected for feature extraction based on PCA. The seven features extracted from PCA were used to establish an extreme learning machine (ELM) classification model with different activation functions. By comparing the classification accuracy and training time of these different models, the activation function of the hidden layer was determined as the sigmoid function. When the number of neurons in the hidden layer was 27, the accuracy of the test set was 98.75%, the accuracy of the training set was 99.06%, and the training time was only 0.0022 s. Finally, in order to verify the superiority of this method in breast cancer diagnosis, we compared with the ELM model based on the original breast cancer data and other intelligent classification algorithm models. The algorithm used in this article has a faster recognition time and a higher recognition accuracy than other algorithms. We also used the breast cancer data of breast tissue reactance features to verify the reliability of this method, and ideal results were obtained. The experimental results show that RF-PCA combined with ELM can significantly reduce the time required for the diagnosis of breast cancer, which has the ability of rapid and accurate identification of breast cancer and provides a theoretical basis for the intelligent diagnosis of breast cancer.
乳腺癌是女性最常见的癌症疾病之一。乳腺癌的快速准确诊断对于癌症治疗具有重要意义。人工智能和机器学习算法被用于识别乳腺恶性肿瘤,这可以有效解决传统乳腺癌诊断方法中识别准确率不足和耗时较长的问题。为了解决这些问题,我们提出了一种基于随机森林(RF)结合主成分分析(PCA)的属性选择和特征提取方法,用于乳腺癌的快速准确诊断。首先,利用RF对乳腺癌分类数据的30个属性进行约简。根据属性的平均重要性和袋外误差,选择21个相对重要的属性数据基于PCA进行特征提取。从PCA中提取的7个特征用于建立具有不同激活函数的极限学习机(ELM)分类模型。通过比较这些不同模型的分类准确率和训练时间,确定隐藏层的激活函数为sigmoid函数。当隐藏层神经元数量为27时,测试集准确率为98.75%,训练集准确率为99.06%,训练时间仅为0.0022秒。最后,为了验证该方法在乳腺癌诊断中的优越性,我们与基于原始乳腺癌数据的ELM模型及其他智能分类算法模型进行了比较。本文所使用的算法比其他算法具有更快的识别时间和更高的识别准确率。我们还使用乳腺组织电抗特征的乳腺癌数据验证了该方法的可靠性,并获得了理想结果。实验结果表明,RF-PCA结合ELM可以显著减少乳腺癌诊断所需时间,具有快速准确识别乳腺癌的能力,为乳腺癌的智能诊断提供了理论依据。