Cunningham Jonathan W, Singh Pulkit, Reeder Christopher, Claggett Brian, Marti-Castellote Pablo M, Lau Emily S, Khurshid Shaan, Batra Puneet, Lubitz Steven A, Maddah Mahnaz, Philippakis Anthony, Desai Akshay S, Ellinor Patrick T, Vardeny Orly, Solomon Scott D, Ho Jennifer E
Division of Cardiovascular Medicine, Brigham and Women's Hospital, Boston, Massachusetts.
Cardiovascular Disease Initiative, Broad Institute of Harvard University and the Massachusetts Institute of Technology, Cambridge, Massachusetts.
medRxiv. 2023 Aug 23:2023.08.17.23294234. doi: 10.1101/2023.08.17.23294234.
The gold standard for outcome adjudication in clinical trials is chart review by a physician clinical events committee (CEC), which requires substantial time and expertise. Automated adjudication by natural language processing (NLP) may offer a more resource-efficient alternative. We previously showed that the Community Care Cohort Project (C3PO) NLP model adjudicates heart failure (HF) hospitalizations accurately within one healthcare system.
This study externally validated the C3PO NLP model against CEC adjudication in the INVESTED trial. INVESTED compared influenza vaccination formulations in 5260 patients with cardiovascular disease at 157 North American sites. A central CEC adjudicated the cause of hospitalizations from medical records. We applied the C3PO NLP model to medical records from 4060 INVESTED hospitalizations and evaluated agreement between the NLP and final consensus CEC HF adjudications. We then fine-tuned the C3PO NLP model (C3PO+INVESTED) and trained a model using half the INVESTED hospitalizations, and evaluated these models in the other half. NLP performance was benchmarked to CEC reviewer inter-rater reproducibility.
1074 hospitalizations (26%) were adjudicated as HF by the CEC. There was high agreement between the C3PO NLP and CEC HF adjudications (agreement 87%, kappa statistic 0.69). C3PO NLP model sensitivity was 94% and specificity was 84%. The fine-tuned C3PO and NLP models demonstrated agreement of 93% and kappa of 0.82 and 0.83, respectively. CEC reviewer inter-rater reproducibility was 94% (kappa 0.85).
Our NLP model developed within a single healthcare system accurately identified HF events relative to the gold-standard CEC in an external multi-center clinical trial. Fine-tuning the model improved agreement and approximated human reproducibility. NLP may improve the efficiency of future multi-center clinical trials by accurately identifying clinical events at scale.
临床试验中结局判定的金标准是由医师临床事件委员会(CEC)进行病历审查,这需要大量时间和专业知识。通过自然语言处理(NLP)进行自动判定可能提供一种更具资源效率的替代方法。我们之前表明,社区护理队列项目(C3PO)NLP模型可在一个医疗系统内准确判定心力衰竭(HF)住院情况。
本研究在INVESTED试验中针对CEC判定对C3PO NLP模型进行了外部验证。INVESTED在北美157个地点对5260例心血管疾病患者的流感疫苗接种制剂进行了比较。一个中央CEC根据病历判定住院原因。我们将C3PO NLP模型应用于4060例INVESTED住院病历,并评估了NLP与最终共识CEC HF判定之间的一致性。然后我们对C3PO NLP模型(C3PO + INVESTED)进行了微调,并使用一半的INVESTED住院病历训练了一个模型,并在另一半病历中评估了这些模型。将NLP性能与CEC评审员的评分者间再现性进行了比较。
CEC判定1074例住院(26%)为HF。C3PO NLP与CEC HF判定之间具有高度一致性(一致性87%,kappa统计量0.69)。C3PO NLP模型的敏感性为94%,特异性为84%。微调后的C3PO和NLP模型的一致性分别为93%,kappa分别为0.82和0.83。CEC评审员的评分者间再现性为94%(kappa 0.85)。
我们在单个医疗系统内开发的NLP模型在一项外部多中心临床试验中相对于金标准CEC准确识别了HF事件。对模型进行微调提高了一致性并接近人类再现性。NLP可能通过大规模准确识别临床事件来提高未来多中心临床试验的效率。