Davies Heather, Nenadic Goran, Alfattni Ghada, Arguello Casteleiro Mercedes, Al Moubayed Noura, Farrell Sean, Radford Alan D, Noble P-J M
Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, United Kingdom.
Department of Computer Science, Manchester University, Manchester, United Kingdom.
Front Vet Sci. 2024 Aug 22;11:1352726. doi: 10.3389/fvets.2024.1352726. eCollection 2024.
In part two of this mini-series, we evaluate the range of machine-learning tools now available for application to veterinary clinical text-mining. These tools will be vital to automate extraction of information from large datasets of veterinary clinical narratives curated by projects such as the Small Animal Veterinary Surveillance Network (SAVSNET) and VetCompass, where volumes of millions of records preclude reading records and the complexities of clinical notes limit usefulness of more "traditional" text-mining approaches. We discuss the application of various machine learning techniques ranging from simple models for identifying words and phrases with similar meanings to expand lexicons for keyword searching, to the use of more complex language models. Specifically, we describe the use of language models for record annotation, unsupervised approaches for identifying topics within large datasets, and discuss more recent developments in the area of generative models (such as ChatGPT). As these models become increasingly complex it is pertinent that researchers and clinicians work together to ensure that the outputs of these models are explainable in order to instill confidence in any conclusions drawn from them.
在本系列迷你文章的第二部分,我们评估了目前可用于兽医临床文本挖掘的一系列机器学习工具。这些工具对于从由小动物兽医监测网络(SAVSNET)和兽医罗盘等项目策划的大量兽医临床叙述数据集中自动提取信息至关重要,在这些项目中,数百万条记录的数量使得阅读记录变得不可能,而且临床笔记的复杂性限制了更“传统”文本挖掘方法的实用性。我们讨论了各种机器学习技术的应用,从用于识别具有相似含义的单词和短语以扩展用于关键词搜索的词汇表的简单模型,到使用更复杂的语言模型。具体而言,我们描述了用于记录注释的语言模型、用于识别大型数据集中主题的无监督方法,并讨论了生成模型领域(如ChatGPT)的最新进展。随着这些模型变得越来越复杂,研究人员和临床医生共同努力以确保这些模型的输出是可解释的,从而使人们对从它们得出的任何结论充满信心,这一点至关重要。