Suppr超能文献

机器学习技术的预测能力可能受到数据集特征的限制:来自 UNOS 数据库的见解。

Predictive Abilities of Machine Learning Techniques May Be Limited by Dataset Characteristics: Insights From the UNOS Database.

机构信息

Section of Cardiovascular Medicine, Yale School of Medicine, New Haven, Connecticut.

Qure.ai, Mumbai, India.

出版信息

J Card Fail. 2019 Jun;25(6):479-483. doi: 10.1016/j.cardfail.2019.01.018. Epub 2019 Feb 6.

Abstract

BACKGROUND

Traditional statistical approaches to prediction of outcomes have drawbacks when applied to large clinical databases. It is hypothesized that machine learning methodologies might overcome these limitations by considering higher-dimensional and nonlinear relationships among patient variables.

METHODS AND RESULTS

The Unified Network for Organ Sharing (UNOS) database was queried from 1987 to 2014 for adult patients undergoing cardiac transplantation. The dataset was divided into 3 time periods corresponding to major allocation adjustments and based on geographic regions. For our outcome of 1-year survival, we used the standard statistical methods logistic regression, ridge regression, and regressions with LASSO (least absolute shrinkage and selection operator) and compared them with the machine learning methodologies neural networks, naïve-Bayes, tree-augmented naïve-Bayes, support vector machines, random forest, and stochastic gradient boosting. Receiver operating characteristic curves and C-statistics were calculated for each model. C-Statistics were used for comparison of discriminatory capacity across models in the validation sample. After identifying 56,477 patients, the major univariate predictors of 1-year survival after heart transplantation were consistent with earlier reports and included age, renal function, body mass index, liver function tests, and hemodynamics. Advanced analytic models demonstrated similarly modest discrimination capabilities compared with traditional models (C-statistic ≤0.66, all). The neural network model demonstrated the highest C-statistic (0.66) but this was only slightly superior to the simple logistic regression, ridge regression, and regression with LASSO models (C-statistic = 0.65, all). Discrimination did not vary significantly across the 3 historically important time periods.

CONCLUSIONS

The use of advanced analytic algorithms did not improve prediction of 1-year survival from heart transplant compared with more traditional prediction models. The prognostic abilities of machine learning techniques may be limited by quality of the clinical dataset.

摘要

背景

传统的统计学方法在应用于大型临床数据库时存在预测结果的局限性。据推测,机器学习方法可以通过考虑患者变量之间的更高维度和非线性关系来克服这些限制。

方法和结果

从 1987 年到 2014 年,对接受心脏移植的成年患者进行了统一器官共享网络 (UNOS) 数据库查询。数据集分为 3 个时间段,对应主要分配调整和地理区域。对于我们的 1 年生存率结果,我们使用了标准的统计方法逻辑回归、岭回归和具有 LASSO(最小绝对收缩和选择算子)的回归,并将它们与机器学习方法神经网络、朴素贝叶斯、树增强朴素贝叶斯、支持向量机、随机森林和随机梯度增强进行了比较。为每个模型计算了接收者操作特征曲线和 C 统计量。C 统计量用于在验证样本中比较模型之间的判别能力。在确定了 56477 名患者后,心脏移植后 1 年生存率的主要单变量预测因素与早期报告一致,包括年龄、肾功能、体重指数、肝功能检查和血液动力学。高级分析模型与传统模型相比,表现出类似的适度判别能力(C 统计量≤0.66,全部)。神经网络模型表现出最高的 C 统计量(0.66),但仅略优于简单逻辑回归、岭回归和 LASSO 回归模型(C 统计量=0.65,全部)。在 3 个具有历史重要性的时期,判别能力没有显著差异。

结论

与更传统的预测模型相比,使用高级分析算法并不能提高心脏移植 1 年生存率的预测。机器学习技术的预后能力可能受到临床数据集质量的限制。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验