Mutalik Pradeep, Cheung Kei-Hoi, Green Jennifer, Buelt-Gebhardt Melissa, Anderson Karen F, Jeanpaul Vales, McDonald Linda, Wininger Michael, Li Yuli, Rajeevan Nallakkandi, Jessel Peter M, Moore Hans, Adabag Selçuk, Raitt Merritt H, Aslan Mihaela
VA Cooperative Studies Program Clinical Epidemiology Research Center (CSP-CERC), VA Connecticut Healthcare System, West Haven, CT.
Yale University School of Medicine, New Haven, CT.
AMIA Annu Symp Proc. 2025 May 22;2024:847-856. eCollection 2024.
The aim of this work was to create a gold-standard curated cohort of 10,000+ cases from the Veteran Affairs (VA) corporate data warehouse (CDW) for virtual emulation of a randomized clinical trial (CSP#592). The trial had six inclusion/exclusion criteria lacking adequate structured data. We therefore used a hybrid computer/human approach to extract information from clinical notes. Rule-based NLP output was iteratively adjudicated by a panel of trained non-clinician content experts and non-experts using an easy-to-use spreadsheet-based rapid adjudication display. This group-adjudication process iteratively sharpened both the computer algorithm and clinical decision criteria, while simultaneously training the non-experts. The cohort was successfully created with each inclusion/exclusion decision backed by a source document. Less than 0.5% of cases required referral to specialist clinicians. It is likely that such curated datasets capturing specialist reasoning and using a process-supervised approach will acquire greater importance as training tools for future clinical AI applications.
这项工作的目的是从退伍军人事务部(VA)企业数据仓库(CDW)中创建一个包含10000多个病例的金标准精选队列,用于虚拟模拟随机临床试验(CSP#592)。该试验有六个纳入/排除标准,但缺乏足够的结构化数据。因此,我们采用了计算机/人工混合方法从临床记录中提取信息。基于规则的自然语言处理输出由一组经过培训的非临床内容专家和非专家使用基于电子表格的易于使用的快速判定显示进行迭代判定。这个小组判定过程迭代地完善了计算机算法和临床决策标准,同时培训了非专家。该队列成功创建,每个纳入/排除决定都有原始文件支持。不到0.5%的病例需要转诊给专科临床医生。随着未来临床人工智能应用培训工具的发展,这种能够捕捉专家推理并采用过程监督方法的精选数据集可能会变得更加重要。