Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Neston, Wirral, United Kingdom.
Centre for Health Informatics, Computing, and Statistics (CHICAS), Lancaster Medical School, Lancaster University, Lancaster, United Kingdom.
PLoS One. 2021 Dec 9;16(12):e0260402. doi: 10.1371/journal.pone.0260402. eCollection 2021.
A key goal of disease surveillance is to identify outbreaks of known or novel diseases in a timely manner. Such an outbreak occurred in the UK associated with acute vomiting in dogs between December 2019 and March 2020. We tracked this outbreak using the clinical free text component of anonymised electronic health records (EHRs) collected from a sentinel network of participating veterinary practices. We sourced the free text (narrative) component of each EHR supplemented with one of 10 practitioner-derived main presenting complaints (MPCs), with the 'gastroenteric' MPC identifying cases involved in the disease outbreak. Such clinician-derived annotation systems can suffer from poor compliance requiring retrospective, often manual, coding, thereby limiting real-time usability, especially where an outbreak of a novel disease might not present clinically as a currently recognised syndrome or MPC. Here, we investigate the use of an unsupervised method of EHR annotation using latent Dirichlet allocation topic-modelling to identify topics inherent within the clinical narrative component of EHRs. The model comprised 30 topics which were used to annotate EHRs spanning the natural disease outbreak and investigate whether any given topic might mirror the outbreak time-course. Narratives were annotated using the Gensim Library LdaModel module for the topic best representing the text within them. Counts for narratives labelled with one of the topics significantly matched the disease outbreak based on the practitioner-derived 'gastroenteric' MPC (Spearman correlation 0.978); no other topics showed a similar time course. Using artificially injected outbreaks, it was possible to see other topics that would match other MPCs including respiratory disease. The underlying topics were readily evaluated using simple word-cloud representations and using a freely available package (LDAVis) providing rapid insight into the clinical basis of each topic. This work clearly shows that unsupervised record annotation using topic modelling linked to simple text visualisations can provide an easily interrogable method to identify and characterise outbreaks and other anomalies of known and previously un-characterised diseases based on changes in clinical narratives.
疾病监测的一个主要目标是及时发现已知或新型疾病的暴发。2019 年 12 月至 2020 年 3 月期间,英国发生了一起与犬只急性呕吐有关的此类暴发事件。我们使用来自参与兽医实践的监测网络收集的匿名电子健康记录 (EHR) 的临床自由文本组件来跟踪此暴发。我们从每个 EHR 中获取自由文本(叙述)组件,并补充了 10 个临床医生提供的主要就诊症状(MPC)之一,其中“胃肠道”MPC 可识别出与疾病暴发相关的病例。这种由临床医生提供的注释系统可能存在合规性差的问题,需要进行回顾性、通常是手动的编码,从而限制了实时可用性,特别是在新型疾病暴发时,疾病可能不会表现出当前公认的综合征或 MPC。在这里,我们研究了使用无监督方法对 EHR 进行注释,使用潜在狄利克雷分配主题建模来识别 EHR 临床叙述组件中固有的主题。该模型由 30 个主题组成,用于注释跨越自然疾病暴发的 EHR,并调查是否存在任何给定主题可能反映暴发时间进程。使用 Gensim Library LdaModel 模块对主题进行注释,该模块用于对主题进行最佳标注。根据临床医生提供的“胃肠道”MPC,使用主题标注的叙述计数与疾病暴发显著匹配(Spearman 相关性 0.978);没有其他主题显示出类似的时间过程。使用人为注入的暴发,可以看到其他主题与其他 MPC 匹配,包括呼吸道疾病。使用简单的词云表示法可以轻松评估基础主题,并使用免费提供的软件包(LDAVis)快速了解每个主题的临床基础。这项工作清楚地表明,使用主题建模进行无监督记录注释,并结合简单的文本可视化,可以提供一种易于查询的方法,根据临床叙述的变化,识别和描述已知和以前未描述疾病的暴发和其他异常情况。
Rev Latinoam Microbiol. 1984
Front Digit Health. 2025-7-24
J Biomed Inform. 2024-9
J Med Internet Res. 2023-12-8
Vet Rec. 2020-3-14
Vet Rec. 2020-2-15
Assist Technol. 2021-5-4
BMC Med Inform Decis Mak. 2019-4-4
BMC Med Res Methodol. 2019-3-19
J Small Anim Pract. 2019-3