Department of Epidemiology, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands.
Department of Cardiology, Meander Medical Center, Amersfoort, the Netherlands; Department of Cardiology, Division Heart & Lungs, University Medical Center Utrecht, Utrecht University, Utrecht, the Netherlands.
J Clin Epidemiol. 2021 Apr;132:97-105. doi: 10.1016/j.jclinepi.2020.11.014. Epub 2020 Nov 25.
This study aimed to validate trial patient eligibility screening and baseline data collection using text-mining in electronic healthcare records (EHRs), comparing the results to those of an international trial.
In three medical centers with different EHR vendors, EHR-based text-mining was used to automatically screen patients for trial eligibility and extract baseline data on nineteen characteristics. First, the yield of screening with automated EHR text-mining search was compared with manual screening by research personnel. Second, the accuracy of extracted baseline data by EHR text mining was compared to manual data entry by research personnel.
Of the 92,466 patients visiting the out-patient cardiology departments, 568 (0.6%) were enrolled in the trial during its recruitment period using manual screening methods. Automated EHR data screening of all patients showed that the number of patients needed to screen could be reduced by 73,863 (79.9%). The remaining 18,603 (20.1%) contained 458 of the actual participants (82.4% of participants). In trial participants, automated EHR text-mining missed a median of 2.8% (Interquartile range [IQR] across all variables 0.4-8.5%) of all data points compared to manually collected data. The overall accuracy of automatically extracted data was 88.0% (IQR 84.7-92.8%).
Automatically extracting data from EHRs using text-mining can be used to identify trial participants and to collect baseline information.
本研究旨在通过电子健康记录(EHR)中的文本挖掘来验证试验患者入选标准筛选和基线数据采集,并将结果与国际试验进行比较。
在三家拥有不同 EHR 供应商的医疗中心,我们使用基于 EHR 的文本挖掘技术自动筛选患者是否符合试验入选标准,并提取 19 项特征的基线数据。首先,将自动化 EHR 文本挖掘搜索的筛选结果与研究人员的手动筛选进行比较。其次,将 EHR 文本挖掘提取的基线数据的准确性与研究人员的手动数据录入进行比较。
在门诊心内科就诊的 92466 名患者中,有 568 名(0.6%)通过手动筛选方法在招募期间被纳入试验。对所有患者进行的自动化 EHR 数据筛选表明,筛选所需的患者数量可减少 73863 名(79.9%)。剩下的 18603 名(20.1%)中包含了 458 名实际参与者(参与者的 82.4%)。在试验参与者中,与手动收集的数据相比,自动化 EHR 文本挖掘漏报了中位数为 2.8%(所有变量的四分位距为 0.4-8.5%)的数据点。自动提取数据的整体准确性为 88.0%(四分位距为 84.7-92.8%)。
使用文本挖掘从 EHR 中自动提取数据可用于识别试验参与者和收集基线信息。