文献检索，用中文搜 PubMed

OBJECTIVE

Applications of machine learning in healthcare are of high interest and have the potential to improve patient care. Yet, the real-world accuracy of these models in clinical practice and on different patient subpopulations remains unclear. To address these important questions, we hosted a community challenge to evaluate methods that predict healthcare outcomes. We focused on the prediction of all-cause mortality as the community challenge question.

MATERIALS AND METHODS

Using a Model-to-Data framework, 345 registered participants, coalescing into 25 independent teams, spread over 3 continents and 10 countries, generated 25 accurate models all trained on a dataset of over 1.1 million patients and evaluated on patients prospectively collected over a 1-year observation of a large health system.

RESULTS

The top performing team achieved a final area under the receiver operator curve of 0.947 (95% CI, 0.942-0.951) and an area under the precision-recall curve of 0.487 (95% CI, 0.458-0.499) on a prospectively collected patient cohort.

DISCUSSION

Post hoc analysis after the challenge revealed that models differ in accuracy on subpopulations, delineated by race or gender, even when they are trained on the same data.

CONCLUSION

This is the largest community challenge focused on the evaluation of state-of-the-art machine learning methods in a healthcare system performed to date, revealing both opportunities and pitfalls of clinical AI.

目的

机器学习在医疗保健领域的应用备受关注，具有改善患者护理的潜力。然而，这些模型在临床实践和不同患者亚群中的实际准确性尚不清楚。为了解决这些重要问题，我们举办了一场社区挑战赛，以评估预测医疗保健结果的方法。我们将预测全因死亡率作为社区挑战赛的问题。

材料和方法

使用模型到数据框架，345 名注册参与者，凝聚成 25 个独立团队，分布在 3 个大洲和 10 个国家，在一个超过 110 万患者的数据集上训练了 25 个准确的模型，并在一个大型医疗系统的为期 1 年的前瞻性患者观察中对这些模型进行了评估。

结果

表现最佳的团队在一个前瞻性收集的患者队列上获得了最终的接收器操作曲线下面积为 0.947（95%置信区间，0.942-0.951）和精度-召回曲线下面积为 0.487（95%置信区间，0.458-0.499）。

讨论

挑战赛结束后的事后分析表明，即使使用相同的数据进行训练，模型在种族或性别等亚群中的准确性也存在差异。

结论

这是迄今为止在医疗保健系统中评估最先进的机器学习方法的最大规模社区挑战赛，揭示了临床人工智能的机遇和陷阱。

Suppr 超能文献

文献检索

文件翻译

深度研究