• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种使用先进特征选择和降维技术检测早期乳腺癌的新型双机器学习方法。

A novel double machine learning approach for detecting early breast cancer using advanced feature selection and dimensionality reduction techniques.

作者信息

Athisayamani Suganya, S Tamilazhagan, Singh A Robert, Hwang Jae-Yong, Joshi Gyanendra Prasad

机构信息

Department of Computing Technologies, SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu, India.

School of Computing, Sastra Deemed to be University, Thanjavur, Tamil Nadu, India.

出版信息

Sci Rep. 2025 Jul 2;15(1):22971. doi: 10.1038/s41598-025-06426-7.

DOI:10.1038/s41598-025-06426-7
PMID:40596255
Abstract

In this paper, three Double Machine Learning (DML) models are proposed to enhance the accuracy of breast cancer detection using machine learning techniques using breast cancer detection dataset. The DML models learn the primary features using machine learning and deep learning models. Then, these features are fused by a meta-classifier to achieve the best classification performance. The first DML model combines the interpretability of Random Forest (RF) with the deep learning capabilities of a Feedforward Neural Network (FNN). RF processes structured features, providing class probabilities and feature importance scores, while the FNN learns non-linear relationships and generates embeddings. These outputs are fused into a combined feature vector, which is then used by a meta-classifier for final predictions. This approach effectively captures both structured features and non-linear patterns, making it suitable for datasets with complex dependencies. The second model pairs eXtreme Gradient Boosting (XGBoost), a highly efficient boosting algorithm for tabular data, with an Artificial Neural Network (ANN). XGBoost optimizes decision tree ensembles and provides class probabilities, while the ANN processes numerical data to learn deeper representations. A meta-classifier then uses the fused outputs from both XGBoost and ANN for final predictions. This model is particularly effective for datasets combining structured features (handled by XGBoost) with numerical features (handled by ANN). The third model integrates LightGBM, a fast and scalable gradient-boosting framework, with an ANN, which is well-suited for analyzing sequential data. LightGBM processes structured features to provide probabilities and importance scores, while the ANN learns temporal dependencies from sequential data. The outputs from LightGBM and ANN are concatenated and passed into a meta-classifier for decision-making. This model is ideal for datasets with both static features (LightGBM) and continuous data (ANN), such as time-series datasets or datasets with sequential dependencies. These DML models, when combined with dimensionality reduction (PCA) and feature selection, significantly improve the performance of breast cancer detection systems by leveraging both structured and sequential data with high accuracy of 0.99.

摘要

在本文中,提出了三种双机器学习(DML)模型,以利用乳腺癌检测数据集通过机器学习技术提高乳腺癌检测的准确性。DML模型使用机器学习和深度学习模型学习主要特征。然后,这些特征由一个元分类器融合,以实现最佳的分类性能。第一个DML模型将随机森林(RF)的可解释性与前馈神经网络(FNN)的深度学习能力相结合。RF处理结构化特征,提供类概率和特征重要性分数,而FNN学习非线性关系并生成嵌入。这些输出被融合成一个组合特征向量,然后由一个元分类器用于最终预测。这种方法有效地捕获了结构化特征和非线性模式,使其适用于具有复杂依赖关系的数据集。第二个模型将用于表格数据的高效提升算法极端梯度提升(XGBoost)与人工神经网络(ANN)配对。XGBoost优化决策树集成并提供类概率,而ANN处理数值数据以学习更深层次的表示。然后,一个元分类器使用来自XGBoost和ANN的融合输出进行最终预测。该模型对于将结构化特征(由XGBoost处理)与数值特征(由ANN处理)相结合的数据集特别有效。第三个模型将快速且可扩展的梯度提升框架LightGBM与适合分析序列数据的ANN集成。LightGBM处理结构化特征以提供概率和重要性分数,而ANN从序列数据中学习时间依赖性。LightGBM和ANN的输出被连接起来并传递到一个元分类器中进行决策。该模型适用于具有静态特征(LightGBM)和连续数据(ANN)的数据集,例如时间序列数据集或具有序列依赖性的数据集。这些DML模型与降维(PCA)和特征选择相结合时,通过利用结构化和序列数据,以0.99的高精度显著提高了乳腺癌检测系统的性能。

相似文献

1
A novel double machine learning approach for detecting early breast cancer using advanced feature selection and dimensionality reduction techniques.一种使用先进特征选择和降维技术检测早期乳腺癌的新型双机器学习方法。
Sci Rep. 2025 Jul 2;15(1):22971. doi: 10.1038/s41598-025-06426-7.
2
Supervised Machine Learning Models for Predicting Sepsis-Associated Liver Injury in Patients With Sepsis: Development and Validation Study Based on a Multicenter Cohort Study.用于预测脓毒症患者脓毒症相关肝损伤的监督式机器学习模型:基于多中心队列研究的开发与验证研究
J Med Internet Res. 2025 May 26;27:e66733. doi: 10.2196/66733.
3
Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.稳定机器学习以获得可重复和可解释的结果:一种针对特定个体见解的新型验证方法。
Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.
4
XGB-BIF: An XGBoost-Driven Biomarker Identification Framework for Detecting Cancer Using Human Genomic Data.XGB-BIF:一种用于利用人类基因组数据检测癌症的基于XGBoost的生物标志物识别框架。
Int J Mol Sci. 2025 Jun 11;26(12):5590. doi: 10.3390/ijms26125590.
5
A Responsible Framework for Assessing, Selecting, and Explaining Machine Learning Models in Cardiovascular Disease Outcomes Among People With Type 2 Diabetes: Methodology and Validation Study.用于评估、选择和解释2型糖尿病患者心血管疾病结局机器学习模型的责任框架:方法与验证研究
JMIR Med Inform. 2025 Jun 27;13:e66200. doi: 10.2196/66200.
6
Machine learning-based radiomics for differentiating lung cancer subtypes in brain metastases using CE-T1WI.基于机器学习的影像组学在使用对比增强T1加权成像鉴别脑转移瘤中肺癌亚型的应用
Front Oncol. 2025 Jun 19;15:1599882. doi: 10.3389/fonc.2025.1599882. eCollection 2025.
7
Prediction of Insulin Resistance in Nondiabetic Population Using LightGBM and Cohort Validation of Its Clinical Value: Cross-Sectional and Retrospective Cohort Study.使用LightGBM预测非糖尿病人群的胰岛素抵抗及其临床价值的队列验证:横断面和回顾性队列研究
JMIR Med Inform. 2025 Jun 13;13:e72238. doi: 10.2196/72238.
8
Skin-CAD: Explainable deep learning classification of skin cancer from dermoscopic images by feature selection of dual high-level CNNs features and transfer learning.皮肤 CAD:基于双高级 CNN 特征选择和迁移学习的皮肤镜图像皮肤癌可解释深度学习分类。
Comput Biol Med. 2024 Aug;178:108798. doi: 10.1016/j.compbiomed.2024.108798. Epub 2024 Jun 25.
9
Interpretable Machine Learning for Serum-Based Metabolomics in Breast Cancer Diagnostics: Insights from Multi-Objective Feature Selection-Driven LightGBM-SHAP Models.用于乳腺癌诊断的基于血清代谢组学的可解释机器学习:多目标特征选择驱动的LightGBM-SHAP模型的见解
Medicina (Kaunas). 2025 Jun 19;61(6):1112. doi: 10.3390/medicina61061112.
10
Leveraging a foundation model zoo for cell similarity search in oncological microscopy across devices.利用基础模型库进行跨设备肿瘤显微镜检查中的细胞相似性搜索。
Front Oncol. 2025 Jun 18;15:1480384. doi: 10.3389/fonc.2025.1480384. eCollection 2025.

本文引用的文献

1
Enhanced forecasting of rice price and production in Malaysia using novel multivariate fuzzy time series models.利用新型多变量模糊时间序列模型加强马来西亚大米价格和产量预测
Sci Rep. 2024 Dec 2;14(1):29903. doi: 10.1038/s41598-024-77907-4.
2
Breast cancer diagnosis using support vector machine optimized by improved quantum inspired grey wolf optimization.基于改进量子灰狼优化算法优化支持向量机的乳腺癌诊断
Sci Rep. 2024 May 10;14(1):10714. doi: 10.1038/s41598-024-61322-w.
3
Breast Cancer Detection and Prevention Using Machine Learning.
利用机器学习进行乳腺癌检测与预防
Diagnostics (Basel). 2023 Oct 2;13(19):3113. doi: 10.3390/diagnostics13193113.
4
Detection and classification of breast cancer using logistic regression feature selection and GMDH classifier.使用逻辑回归特征选择和 GMDH 分类器进行乳腺癌检测和分类。
J Biomed Inform. 2020 Nov;111:103591. doi: 10.1016/j.jbi.2020.103591. Epub 2020 Oct 8.
5
Predicting breast cancer survivability: a comparison of three data mining methods.预测乳腺癌的生存能力:三种数据挖掘方法的比较
Artif Intell Med. 2005 Jun;34(2):113-27. doi: 10.1016/j.artmed.2004.07.002.