Huang David, Cogill Steven, Hsia Renee Y, Yang Samuel, Kim David
Department of Computer Science, Stanford University, Stanford, CA, USA.
Department of Veterans Affairs, Seattle, WA, USA.
NPJ Digit Med. 2023 Jul 19;6(1):131. doi: 10.1038/s41746-023-00875-y.
Non-accidental trauma (NAT) is deadly and difficult to predict. Transformer models pretrained on large datasets have recently produced state of the art performance on diverse prediction tasks, but the optimal pretraining strategies for diagnostic predictions are not known. Here we report the development and external validation of Pretrained and Adapted BERT for Longitudinal Outcomes (PABLO), a transformer-based deep learning model with multitask clinical pretraining, to identify patients who will receive a diagnosis of NAT in the next year. We develop a clinical interface to visualize patient trajectories, model predictions, and individual risk factors. In two comprehensive statewide databases, approximately 1% of patients experience NAT within one year of prediction. PABLO predicts NAT events with area under the receiver operating characteristic curve (AUROC) of 0.844 (95% CI 0.838-0.851) in the California test set, and 0.849 (95% CI 0.846-0.851) on external validation in Florida, outperforming comparator models. Multitask pretraining significantly improves model performance. Attribution analysis shows substance use, psychiatric, and injury diagnoses, in the context of age and racial demographics, as influential predictors of NAT. As a clinical decision support system, PABLO can identify high-risk patients and patient-specific risk factors, which can be used to target secondary screening and preventive interventions at the point-of-care.
非意外创伤(NAT)具有致命性且难以预测。最近,在大型数据集上预训练的Transformer模型在各种预测任务中取得了领先的性能,但诊断预测的最佳预训练策略尚不清楚。在此,我们报告了用于纵向结果的预训练和自适应BERT(PABLO)的开发及外部验证情况,这是一种基于Transformer的深度学习模型,具有多任务临床预训练功能,用于识别在未来一年内将被诊断为NAT的患者。我们开发了一个临床界面,以可视化患者轨迹、模型预测结果和个体风险因素。在两个全面的全州数据库中,约1%的患者在预测后的一年内经历了NAT。在加利福尼亚测试集中,PABLO预测NAT事件的受试者操作特征曲线下面积(AUROC)为0.844(95%置信区间0.838 - 0.851),在佛罗里达的外部验证中为0.849(95%置信区间0.846 - 0.851),优于比较模型。多任务预训练显著提高了模型性能。归因分析表明,在年龄和种族人口统计学背景下,物质使用、精神疾病和损伤诊断是NAT的有影响力的预测因素。作为一个临床决策支持系统,PABLO可以识别高危患者和患者特定的风险因素,这些因素可用于在医疗点进行二级筛查和预防性干预。