Suppr超能文献

基于机器学习的临床预测建模基础:第五部分——回归问题的实用方法。

Foundations of Machine Learning-Based Clinical Prediction Modeling: Part V-A Practical Approach to Regression Problems.

机构信息

Machine Intelligence in Clinical Neuroscience (MICN) Laboratory, Department of Neurosurgery, Clinical Neuroscience Center, University Hospital Zurich, University of Zurich, Zurich, Switzerland.

Neurosurgical Artificial Intelligence Laboratory Aachen (NAILA), Department of Neurosurgery, RWTH Aachen University Hospital, Aachen, Germany.

出版信息

Acta Neurochir Suppl. 2022;134:43-50. doi: 10.1007/978-3-030-85292-4_6.

Abstract

This chapter goes through the steps required to train and validate a simple, machine learning-based clinical prediction model for any continuous outcome. We supply fully structured code for the readers to download and execute in parallel to this section, as well as a simulated database of 10,000 glioblastoma patients who underwent microsurgery, and predict survival from diagnosis in months. We walk the reader through each step, including import, checking, splitting of data. In terms of pre-processing, we focus on how to practically implement imputation using a k-nearest neighbor algorithm. We also illustrate how to select features based on recursive feature elimination and how to use k-fold cross validation. We demonstrate a generalized linear model, a generalized additive model, a random forest, a ridge regressor, and a Least Absolute Shrinkage and Selection Operator (LASSO) regressor. Specifically for regression, we discuss how to evaluate root mean square error (RMSE), mean average error (MAE), and the R statistic, as well as how a quantile-quantile plot can be used to assess the performance of the regressor along the spectrum of the outcome variable, similarly to calibration when dealing with binary outcomes. Finally, we explain how to arrive at a measure of variable importance using a universal, nonparametric method.

摘要

这一章介绍了训练和验证基于机器学习的简单临床预测模型的步骤,用于预测任何连续结果。我们为读者提供了完整的结构化代码,以便与本节内容一起下载和执行,同时还提供了一个模拟的 10000 名接受微创手术的胶质母细胞瘤患者数据库,以预测从诊断到存活的时间。我们将引导读者完成每个步骤,包括导入、检查和数据分割。在预处理方面,我们重点介绍如何使用 k-最近邻算法实际实现插补。我们还说明了如何根据递归特征消除选择特征,以及如何使用 k 折交叉验证。我们展示了广义线性模型、广义加性模型、随机森林、岭回归和最小绝对收缩和选择算子 (LASSO) 回归。具体针对回归,我们讨论了如何评估均方根误差 (RMSE)、平均绝对误差 (MAE) 和 R 统计量,以及如何使用分位数-分位数图评估回归器在因变量谱上的性能,类似于处理二分类结果时的校准。最后,我们解释了如何使用通用的非参数方法来确定变量的重要性度量。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验