Yang Hyun-Lim, Celi Leo Anthony, Lee Hyeonhoon, Park Seong-A, Lee SangJin, Jung Chul-Woo, Lee Hyung-Chul
Office of Hospital Information, Seoul National University Hospital, Seoul, Republic of Korea; Innovative Medical Technology Research Institute, Seoul National University Hospital, Seoul, Republic of Korea; Department of Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea.
Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, MA, USA; Division of Pulmonary, Critical Care and Sleep Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
Br J Anaesth. 2025 Sep;135(3):571-581. doi: 10.1016/j.bja.2025.03.024. Epub 2025 May 22.
There are models to predict intraoperative hypotension from arterial pressure waveforms. Selection bias in datasets used for model development and validation could impact model performance. We aimed to evaluate how selection bias affects the predictive performance of a deep learning (DL)-based model and a model using only mean arterial pressure (MAP) as input (MAP-only model).
We used the VitalDB open dataset. A hypotensive event was defined as a MAP <65 mm Hg for 1 min. For the 'biased dataset', 'non-hypotensive events' needed to be (a) at the centre of a 'non-hypotensive period' with a MAP of >75 mm Hg for more than 30 continuous minutes and (b) at least 20 min apart from any hypotensive event. For the 'unbiased dataset', all samples were included unless the hypotensive event was already in the input segment. The alarms per hour and positive predictive values were compared between the DL and MAP-only models.
The DL model generally performed better than the MAP-only model. For the prediction of intraoperative hypotension 5 min before the event with the DL model, using the unbiased vs the biased testing dataset resulted in 18.1 vs 10.8 alarms per hour (P<0.001) and a positive predictive value of 0.068 vs 0.937 (P<0.001).
Both the DL model and the MAP-only model demonstrated worse predictive performance when tested on the unbiased dataset compared with the biased dataset. Although the DL model statistically performed better than the MAP-only model, the difference between the two models was not clinically meaningful. Clinicians should consider the potential impact of selection bias on the validation and the clinical performance of hypotension prediction models.
NCT02914444.
有一些模型可根据动脉压波形预测术中低血压。用于模型开发和验证的数据集中的选择偏倚可能会影响模型性能。我们旨在评估选择偏倚如何影响基于深度学习(DL)的模型和仅使用平均动脉压(MAP)作为输入的模型(仅MAP模型)的预测性能。
我们使用了VitalDB开放数据集。低血压事件定义为MAP<65 mmHg持续1分钟。对于“有偏数据集”,“非低血压事件”需要满足:(a)处于“非低血压期”的中心,MAP>75 mmHg持续超过30分钟,且(b)与任何低血压事件间隔至少20分钟。对于“无偏数据集”,除非低血压事件已在输入段中,否则包含所有样本。比较了DL模型和仅MAP模型每小时的警报数和阳性预测值。
DL模型总体表现优于仅MAP模型。使用DL模型预测事件发生前5分钟的术中低血压时,使用无偏测试数据集与有偏测试数据集相比,每小时警报数分别为18.1次和10.8次(P<0.001),阳性预测值分别为0.068和0.937(P<0.001)。
与有偏数据集相比,在无偏数据集上进行测试时,DL模型和仅MAP模型的预测性能均较差。尽管DL模型在统计学上比仅MAP模型表现更好,但两种模型之间的差异在临床上并无意义。临床医生应考虑选择偏倚对低血压预测模型验证和临床性能的潜在影响。
NCT02914444。