• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于结直肠癌风险预测模型的特征选择的比较研究。

A comparative study on feature selection for a risk prediction model for colorectal cancer.

机构信息

Department of Electrical, Systems and Automatic Engineering, Universidad of León, Campus de Vegazana s/n, León 24071, Spain.

Grupo Investigación Interacciones Gen-Ambiente y Salud (GIIGAS), Centro de Investigación Biomédica en Red (CIBER), Spain.

出版信息

Comput Methods Programs Biomed. 2019 Aug;177:219-229. doi: 10.1016/j.cmpb.2019.06.001. Epub 2019 Jun 4.

DOI:10.1016/j.cmpb.2019.06.001
PMID:31319951
Abstract

BACKGROUND AND OBJECTIVE

Risk prediction models aim at identifying people at higher risk of developing a target disease. Feature selection is particularly important to improve the prediction model performance avoiding overfitting and to identify the leading cancer risk (and protective) factors. Assessing the stability of feature selection/ranking algorithms becomes an important issue when the aim is to analyze the features with more prediction power.

METHODS

This work is focused on colorectal cancer, assessing several feature ranking algorithms in terms of performance for a set of risk prediction models (Neural Networks, Support Vector Machines (SVM), Logistic Regression, k-Nearest Neighbors and Boosted Trees). Additionally, their robustness is evaluated following a conventional approach with scalar stability metrics and a visual approach proposed in this work to study both similarity among feature ranking techniques as well as their individual stability. A comparative analysis is carried out between the most relevant features found out in this study and features provided by the experts according to the state-of-the-art knowledge.

RESULTS

The two best performance results in terms of Area Under the ROC Curve (AUC) are achieved with a SVM classifier using the top-41 features selected by the SVM wrapper approach (AUC=0.693) and Logistic Regression with the top-40 features selected by the Pearson (AUC=0.689). Experiments showed that performing feature selection contributes to classification performance with a 3.9% and 1.9% improvement in AUC for the SVM and Logistic Regression classifier, respectively, with respect to the results using the full feature set. The visual approach proposed in this work allows to see that the Neural Network-based wrapper ranking is the most unstable while the Random Forest is the most stable.

CONCLUSIONS

This study demonstrates that stability and model performance should be studied jointly as Random Forest turned out to be the most stable algorithm but outperformed by others in terms of model performance while SVM wrapper and the Pearson correlation coefficient are moderately stable while achieving good model performance.

摘要

背景与目的

风险预测模型旨在识别具有更高发病风险的目标疾病人群。特征选择对于提高预测模型的性能、避免过度拟合以及识别主要的癌症风险(和保护)因素尤为重要。当目标是分析具有更多预测能力的特征时,评估特征选择/排序算法的稳定性成为一个重要问题。

方法

这项工作专注于结直肠癌,从性能角度评估了几种特征排序算法,这些算法适用于一组风险预测模型(神经网络、支持向量机 (SVM)、逻辑回归、k-最近邻和 Boosted Trees)。此外,还使用传统的标量稳定性指标和本文提出的可视化方法评估了它们的稳健性,以研究特征排序技术之间的相似性以及它们各自的稳定性。对本研究中发现的最相关特征与根据最新知识提供的专家特征进行了对比分析。

结果

在曲线下面积 (AUC) 方面,SVM 分类器使用 SVM 包装器方法选择的前 41 个特征(AUC=0.693)和使用 Pearson 选择的前 40 个特征的逻辑回归的 AUC 取得了最佳性能结果(AUC=0.689)。实验表明,进行特征选择有助于提高分类性能,SVM 和逻辑回归分类器的 AUC 分别提高了 3.9%和 1.9%,相对于使用全特征集的结果。本文提出的可视化方法表明,基于神经网络的包装器排序最不稳定,而随机森林最稳定。

结论

本研究表明,稳定性和模型性能应该一起研究,因为随机森林虽然在模型性能方面表现优于其他算法,但在稳定性方面却表现不佳,而 SVM 包装器和 Pearson 相关系数在实现良好模型性能的同时具有中等稳定性。

相似文献

1
A comparative study on feature selection for a risk prediction model for colorectal cancer.用于结直肠癌风险预测模型的特征选择的比较研究。
Comput Methods Programs Biomed. 2019 Aug;177:219-229. doi: 10.1016/j.cmpb.2019.06.001. Epub 2019 Jun 4.
2
Evaluation of Feature Selection Techniques for Breast Cancer Risk Prediction.乳腺癌风险预测特征选择技术评估。
Int J Environ Res Public Health. 2021 Oct 12;18(20):10670. doi: 10.3390/ijerph182010670.
3
A reliable method for colorectal cancer prediction based on feature selection and support vector machine.基于特征选择和支持向量机的结直肠癌预测可靠方法。
Med Biol Eng Comput. 2019 Apr;57(4):901-912. doi: 10.1007/s11517-018-1930-0. Epub 2018 Nov 26.
4
Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction.机器学习中特征选择的最佳评分对及其在癌症预后预测中的应用。
BMC Bioinformatics. 2011 Sep 23;12:375. doi: 10.1186/1471-2105-12-375.
5
Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.机器学习算法在(放化疗)治疗结果预测中的应用:分类器的实证比较。
Med Phys. 2018 Jul;45(7):3449-3459. doi: 10.1002/mp.12967. Epub 2018 Jun 13.
6
Robust edge-based biomarker discovery improves prediction of breast cancer metastasis.基于稳健边缘的生物标志物发现可提高乳腺癌转移的预测能力。
BMC Bioinformatics. 2020 Sep 30;21(Suppl 14):359. doi: 10.1186/s12859-020-03692-2.
7
Comparison of machine learning algorithms for clinical event prediction (risk of coronary heart disease).机器学习算法在临床事件预测(冠心病风险)中的比较。
J Biomed Inform. 2019 Sep;97:103257. doi: 10.1016/j.jbi.2019.103257. Epub 2019 Jul 30.
8
Robust feature selection to predict tumor treatment outcome.用于预测肿瘤治疗结果的稳健特征选择。
Artif Intell Med. 2015 Jul;64(3):195-204. doi: 10.1016/j.artmed.2015.07.002. Epub 2015 Aug 14.
9
An Efficient Feature Selection Strategy Based on Multiple Support Vector Machine Technology with Gene Expression Data.基于基因表达数据的多支持向量机技术的高效特征选择策略。
Biomed Res Int. 2018 Aug 30;2018:7538204. doi: 10.1155/2018/7538204. eCollection 2018.
10
Seminal quality prediction using data mining methods.使用数据挖掘方法进行精液质量预测。
Technol Health Care. 2014;22(4):531-45. doi: 10.3233/THC-140816.

引用本文的文献

1
CAAFE-ResNet: A ResNet With Channel Attention-Augmented Feature Extraction for Prognostic Assessment in Rectal Cancer.CAAFE-ResNet:一种用于直肠癌预后评估的具有通道注意力增强特征提取的残差网络。
IET Syst Biol. 2025 Jan-Dec;19(1):e70030. doi: 10.1049/syb2.70030.
2
Machine learning to evaluate the effects of non-clinical social determinant features in predicting colorectal Cancer mortality in a medically underserved Appalachian population.机器学习用于评估非临床社会决定因素特征在预测医疗服务不足的阿巴拉契亚人群结直肠癌死亡率中的作用。
Sci Rep. 2025 Jul 16;15(1):25781. doi: 10.1038/s41598-025-11074-y.
3
Robust identification key predictors of short- and long-term weight status in children and adolescents by machine learning.
机器学习识别儿童和青少年短期和长期体重状况的关键预测因子。
Front Public Health. 2024 Sep 24;12:1414046. doi: 10.3389/fpubh.2024.1414046. eCollection 2024.
4
Targeted variant prevalence of FBXW7 gene mutation in colorectal carcinoma propagation. The first systematic review and meta-analysis.结直肠癌增殖中FBXW7基因突变的靶向变异患病率。首个系统评价与Meta分析。
Heliyon. 2024 May 22;10(11):e31471. doi: 10.1016/j.heliyon.2024.e31471. eCollection 2024 Jun 15.
5
Machine learning-based classifiers to predict metastasis in colorectal cancer patients.基于机器学习的分类器用于预测结直肠癌患者的转移情况。
Front Artif Intell. 2024 Jan 24;7:1285037. doi: 10.3389/frai.2024.1285037. eCollection 2024.
6
Development and validation of a nomogram predictive model for colorectal adenoma with low-grade intraepithelial neoplasia using routine laboratory tests: A single-center case-control study in China.使用常规实验室检查建立并验证预测结直肠腺瘤伴低级别上皮内瘤变的列线图预测模型:中国一项单中心病例对照研究
Heliyon. 2023 Oct 13;9(11):e20996. doi: 10.1016/j.heliyon.2023.e20996. eCollection 2023 Nov.
7
Disulfidptosis-associated lncRNAs predict breast cancer subtypes.二硫键相关长非编码 RNA 预测乳腺癌亚型。
Sci Rep. 2023 Sep 27;13(1):16268. doi: 10.1038/s41598-023-43414-1.
8
Reconstructing the cytokine view for the multi-view prediction of COVID-19 mortality.重建细胞因子视角,用于 COVID-19 死亡率的多视角预测。
BMC Infect Dis. 2023 Sep 21;23(1):622. doi: 10.1186/s12879-023-08291-z.
9
Fluorescence optical imaging feature selection with machine learning for differential diagnosis of selected rheumatic diseases.基于机器学习的荧光光学成像特征选择用于特定风湿性疾病的鉴别诊断
Front Med (Lausanne). 2023 Aug 21;10:1228833. doi: 10.3389/fmed.2023.1228833. eCollection 2023.
10
Accurate breast cancer diagnosis using a stable feature ranking algorithm.使用稳定特征排序算法进行准确的乳腺癌诊断。
BMC Med Inform Decis Mak. 2023 Apr 6;23(1):64. doi: 10.1186/s12911-023-02142-2.