Lundbeck, Valby, Denmark; Clinical Memory Research Unit, Lund University, Lund, Sweden.
IBM Danmark, Brøndby, Denmark.
Lancet Digit Health. 2020 May;2(5):e229-e239. doi: 10.1016/S2589-7500(20)30024-8. Epub 2020 Mar 26.
Many individuals who will experience a first episode of psychosis (FEP) are not detected before occurrence, limiting the effect of preventive interventions. The combination of machine-learning methods and electronic health records (EHRs) could help address this gap.
This case-control development and validation study is based on EHR data from IBM Explorys. The IBM Explorys Platform holds standardised, longitudinal, de-identified, patient-level EHR data pooled from different health-care systems with distinct EHRs. The present EHR-based studies were retrospective, matched (1:1), case-control studies compliant with RECORD, STROBE, and TRIPOD statements. The study included individuals in the IBM Explorys database who at some point between 1990 and 2018 had a diagnosis of FEP followed by schizophrenia, and psychosis-free matched control individuals from a random subsample of the full cohort. For every individual in the FEP cohort, the individual in the control cohort was matched to have a similar date for inclusion in the database and a similar total observation time. Individuals in the FEP cohort had their index date defined as the first diagnosis of psychosis or the first prescription of antipsychotic medication. Individuals in the control cohort had their index date defined to occur the same number of days after inclusion in the database as their matching FEP individual. The FEP and control cohorts were both randomly split into development and validation datasets in a ratio of 7:3. The subset of individuals in the validation dataset who had all their health-care encounters at providers that were not seen in the development dataset made up the external validation subset. A novel recurrent neural network model was developed to predict the risk of FEP 1 year before the index date by employing demographics and medical events (in the categories diagnoses, prescriptions, procedures, encounters and admissions, observations, and laboratory test results) dynamically collected in the EHR as part of clinical routine. We named the recurrent neural network Dynamic ElecTronic hEalth reCord deTection (DETECT). The main outcomes were accuracy and area under receiver operating characteristic curve (AUROC). Decision-curve analyses and dynamic patient journey plots were used to evaluate clinical usefulness.
The FEP and control cohorts each comprised 72 860 individuals. 102 030 individuals (51 015 matching pairs) were randomly allocated to the development dataset and the remaining 43 690 to the validation dataset. In the validation dataset, 4770 individuals had all their encounters outside of the 118 790 health-care providers that were encountered in the development dataset. The data from these individuals made up the external validation subset. The median follow-up (observation time before index date) was 6·0 years (IQR 3·0-10·4). In the development dataset, DETECT's prognostic accuracy was 0·787 and AUROC was 0·868. In the validation dataset, DETECT's prognostic accuracy was 0·774 and AUROC was 0·856. In the external test subset, DETECT's balanced prognostic accuracy was 0·724 and AUROC was 0·799. Prevalence-adjusted decision-curve analyses suggested that DETECT was associated with a positive net benefit in two different scenarios for FEP detection.
DETECT showed adequate prognostic accuracy to detect individuals at risk of developing a FEP in primary and secondary care. Replication and refinement in a population-based setting are needed to consolidate these findings.
Lundbeck.
许多首次出现精神病症状(FEP)的个体在发病前并未被发现,这限制了预防干预的效果。机器学习方法与电子健康记录(EHR)的结合可以帮助解决这一差距。
本病例对照开发和验证研究基于 IBM Explorys 的 EHR 数据。IBM Explorys 平台包含标准化的、纵向的、去识别的、从具有不同 EHR 的不同医疗保健系统中汇集的患者级 EHR 数据。本基于 EHR 的研究是回顾性的、匹配(1:1)的病例对照研究,符合 RECORD、STROBE 和 TRIPOD 声明。该研究纳入了 IBM Explorys 数据库中在 1990 年至 2018 年间出现过 FEP 随后被诊断为精神分裂症的个体,以及从整个队列中随机子样本中匹配的无精神病对照个体。对于 FEP 队列中的每一个个体,对照队列中的个体都与数据库中的纳入日期相似,且总观察时间也相似。FEP 队列中的个体将其索引日期定义为首次诊断为精神病或首次开具抗精神病药物的日期。对照队列中的个体将其索引日期定义为与匹配的 FEP 个体相同的天数后纳入数据库。FEP 和对照队列均以 7:3 的比例随机分为开发和验证数据集。验证数据集中的子集,其所有医疗保健接触均在开发数据集中未见到的提供者处进行,构成外部验证子集。通过采用动态收集的 EHR 中的人口统计学和医疗事件(包括诊断、处方、程序、就诊和入院、观察和实验室检查结果)类别,开发了一种新的递归神经网络模型,以预测在索引日期前 1 年发生 FEP 的风险。我们将该递归神经网络命名为动态电子健康记录检测(DETECT)。主要结局为准确性和接受者操作特征曲线下面积(AUROC)。决策曲线分析和动态患者就诊轨迹图用于评估临床实用性。
FEP 和对照队列各包含 72015 名个体。102030 名个体(51015 对匹配个体)被随机分配到开发数据集,其余 43690 名个体分配到验证数据集。在验证数据集中,4770 名个体的所有就诊均在开发数据集中遇到的 118790 家医疗保健提供者之外进行。这些个体的数据构成了外部验证子集。中位随访(索引日期前的观察时间)为 6.0 年(IQR:3.0-10.4)。在开发数据集中,DETECT 的预后准确性为 0.787,AUROC 为 0.868。在验证数据集中,DETECT 的预后准确性为 0.774,AUROC 为 0.856。在外部测试子集中,DETECT 的均衡预后准确性为 0.724,AUROC 为 0.799。经过患病率调整的决策曲线分析表明,在两种不同的 FEP 检测情况下,DETECT 与正净效益相关。在基于人群的环境中进行复制和改进,以巩固这些发现。
DETECT 显示出足够的预后准确性,可用于在初级和二级保健中识别发生 FEP 的风险个体。需要在基于人群的环境中进行复制和改进,以巩固这些发现。
Lundbeck。