Department of Computer Science, Durham University, Durham, UK.
Centre for Health Informatics, Computing, and Statistics, Lancaster Medical School, Lancaster University, Lancaster, UK.
Sci Rep. 2023 Oct 21;13(1):18015. doi: 10.1038/s41598-023-45155-7.
Effective public health surveillance requires consistent monitoring of disease signals such that researchers and decision-makers can react dynamically to changes in disease occurrence. However, whilst surveillance initiatives exist in production animal veterinary medicine, comparable frameworks for companion animals are lacking. First-opinion veterinary electronic health records (EHRs) have the potential to reveal disease signals and often represent the initial reporting of clinical syndromes in animals presenting for medical attention, highlighting their possible significance in early disease detection. Yet despite their availability, there are limitations surrounding their free text-based nature, inhibiting the ability for national-level mortality and morbidity statistics to occur. This paper presents PetBERT, a large language model trained on over 500 million words from 5.1 million EHRs across the UK. PetBERT-ICD is the additional training of PetBERT as a multi-label classifier for the automated coding of veterinary clinical EHRs with the International Classification of Disease 11 framework, achieving F1 scores exceeding 83% across 20 disease codings with minimal annotations. PetBERT-ICD effectively identifies disease outbreaks, outperforming current clinician-assigned point-of-care labelling strategies up to 3 weeks earlier. The potential for PetBERT-ICD to enhance disease surveillance in veterinary medicine represents a promising avenue for advancing animal health and improving public health outcomes.
有效的公共卫生监测需要持续监测疾病信号,以便研究人员和决策者能够对疾病发生的变化做出动态反应。然而,虽然在生产动物兽医医学中有监测举措,但缺乏类似的伴侣动物框架。第一意见兽医电子健康记录 (EHR) 有可能揭示疾病信号,并且通常代表着动物就诊时临床综合征的初始报告,突出了它们在早期疾病检测中的可能意义。尽管它们已经存在,但由于其基于自由文本的性质存在限制,因此无法进行国家级的死亡率和发病率统计。本文介绍了 PetBERT,这是一个在英国 510 万份 EHR 中超过 5 亿字的大型语言模型。PetBERT-ICD 是对 PetBERT 的额外训练,作为一种多标签分类器,用于对兽医临床 EHR 进行国际疾病分类第 11 版的自动编码,在 20 种疾病编码中实现了超过 83%的 F1 分数,只需最小的注释。PetBERT-ICD 能够有效地识别疾病爆发,比当前临床医生分配的即时检测标签策略提前多达 3 周。PetBERT-ICD 增强兽医医学疾病监测的潜力代表了一个有前途的途径,可以促进动物健康和改善公共卫生结果。