Suppr超能文献

使用细针穿刺特征和监督式机器学习进行上采样的乳腺癌预测

Breast Cancer Prediction Using Fine Needle Aspiration Features and Upsampling with Supervised Machine Learning.

作者信息

Shafique Rahman, Rustam Furqan, Choi Gyu Sang, Díez Isabel de la Torre, Mahmood Arif, Lipari Vivian, Velasco Carmen Lili Rodríguez, Ashraf Imran

机构信息

Department of Information and Communication Engineering, Yeungnam University, Gyeongsan 38541, Republic of Korea.

School of Computer Science, University College Dublin, D04 V1W8 Dublin, Ireland.

出版信息

Cancers (Basel). 2023 Jan 22;15(3):681. doi: 10.3390/cancers15030681.

Abstract

Breast cancer is one of the most common invasive cancers in women and it continues to be a worldwide medical problem since the number of cases has significantly increased over the past decade. Breast cancer is the second leading cause of death from cancer in women. The early detection of breast cancer can save human life but the traditional approach for detecting breast cancer disease needs various laboratory tests involving medical experts. To reduce human error and speed up breast cancer detection, an automatic system is required that would perform the diagnosis accurately and timely. Despite the research efforts for automated systems for cancer detection, a wide gap exists between the desired and provided accuracy of current approaches. To overcome this issue, this research proposes an approach for breast cancer prediction by selecting the best fine needle aspiration features. To enhance the prediction accuracy, several feature selection techniques are applied to analyze their efficacy, such as principal component analysis, singular vector decomposition, and chi-square (Chi2). Extensive experiments are performed with different features and different set sizes of features to investigate the optimal feature set. Additionally, the influence of imbalanced and balanced data using the SMOTE approach is investigated. Six classifiers including random forest, support vector machine, gradient boosting machine, logistic regression, multilayer perceptron, and K-nearest neighbors (KNN) are tuned to achieve increased classification accuracy. Results indicate that KNN outperforms all other classifiers on the used dataset with 20 features using SVD and with the 15 most important features using a PCA with a 100% accuracy score.

摘要

乳腺癌是女性中最常见的侵袭性癌症之一,并且由于病例数量在过去十年中显著增加,它仍然是一个全球性的医学问题。乳腺癌是女性癌症死亡的第二大主要原因。乳腺癌的早期检测可以挽救生命,但传统的乳腺癌检测方法需要各种涉及医学专家的实验室检测。为了减少人为误差并加快乳腺癌检测速度,需要一个能够准确、及时地进行诊断的自动系统。尽管在癌症检测自动化系统方面进行了研究,但当前方法的期望准确率和实际提供的准确率之间仍存在很大差距。为了克服这个问题,本研究提出了一种通过选择最佳细针穿刺特征来预测乳腺癌的方法。为了提高预测准确率,应用了几种特征选择技术来分析它们的有效性,如主成分分析、奇异值分解和卡方检验(Chi2)。使用不同的特征和不同的特征集大小进行了广泛的实验,以研究最优特征集。此外,还研究了使用SMOTE方法对不平衡和平衡数据的影响。对包括随机森林、支持向量机、梯度提升机、逻辑回归、多层感知器和K近邻(KNN)在内的六种分类器进行了调优,以提高分类准确率。结果表明,在使用奇异值分解的20个特征的数据集上以及在使用主成分分析的15个最重要特征的数据集上,KNN的表现优于所有其他分类器,准确率达到100%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9196/9913345/a487751c13e8/cancers-15-00681-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验