Vulpius Siri A, Werge Sebastian, Jørgensen Isabella Friis, Siggaard Troels, Hernansanz Biel Jorge, Knudsen Gitte M, Brunak Søren, Pinborg Lars H
Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark.
Epilepsy Clinic and Neurobiology Research Unit, University Hospital Rigshospitalet, Copenhagen, Denmark.
Epilepsia. 2023 Oct;64(10):2750-2760. doi: 10.1111/epi.17734. Epub 2023 Aug 19.
Combining population-based health registries and electronic health records offers the opportunity to create large, phenotypically detailed patient cohorts of high quality. In this study, we used text mining of clinical notes to confirm International Classification of Diseases, 10th Revision (ICD-10)-registered epilepsy diagnoses and classify patients according to focal and generalized epilepsy types.
Using the Danish National Patient Registry, we identified patients who between 2006 and 2016 received an ICD-10 diagnosis of epilepsy. To validate the epilepsy diagnosis and stratify patients into focal and generalized epilepsy types, we constructed dictionaries for text mining-based extraction of clinical notes. Two physicians manually reviewed the clinical notes for a total of 527 patients and assigned epilepsy diagnoses, which were compared with the text-mined diagnoses.
We identified 23 632 patients with an ICD-10 diagnosis of epilepsy, of whom 50% were registered with an unspecified epilepsy diagnosis. In total, 11 211 patients were considered likely to have epilepsy by text mining, with an F1 measure ranging from 82% to 90%. Manual review of the electronic health records for 310 patients revealed a false discovery rate of 29%. This rate was decreased to 4% by the text mining algorithm. The weighted average F1 measure for text mining-assigned epilepsy types was 79% (82% for focal and 76% for generalized epilepsy). Text mining successfully assigned a focal or generalized epilepsy type to 92% of the text mining-eligible patients registered with unspecified epilepsy.
Text mining of electronic health records can be used to establish a patient cohort with much higher likelihood of having a diagnosis of epilepsy and a focal or generalized epilepsy type compared to the cohort created from ICD-10 epilepsy codes alone. We believe the concept will be essential for future genome-wide and phenome-wide association studies and subsequently the development of precision medicine for epilepsy patients.
将基于人群的健康登记与电子健康记录相结合,为创建大规模、表型详细的高质量患者队列提供了机会。在本研究中,我们使用临床记录的文本挖掘来确认国际疾病分类第10版(ICD-10)登记的癫痫诊断,并根据局灶性和全身性癫痫类型对患者进行分类。
利用丹麦国家患者登记处,我们确定了2006年至2016年间接受ICD-10癫痫诊断的患者。为了验证癫痫诊断并将患者分层为局灶性和全身性癫痫类型,我们构建了用于基于文本挖掘提取临床记录的词典。两名医生对总共527名患者的临床记录进行了人工审查,并给出癫痫诊断,将其与文本挖掘诊断进行比较。
我们确定了2