Cutforth Murray, Watson Hannah, Brown Cameron, Wang Chaoyang, Thomson Stuart, Fell Dickon, Dilys Vismantas, Scrimgeour Morag, Schrempf Patrick, Lesh James, Muir Keith, Weir Alexander, O'Neil Alison Q
Canon Medical Research Europe, Edinburgh, United Kingdom.
Institute of Neuroscience & Psychology, University of Glasgow, Glasgow, United Kingdom.
Front Digit Health. 2023 Jun 14;5:1186516. doi: 10.3389/fdgth.2023.1186516. eCollection 2023.
Thrombolysis treatment for acute ischaemic stroke can lead to better outcomes if administered early enough. However, contraindications exist which put the patient at greater risk of a bleed (e.g. recent major surgery, anticoagulant medication). Therefore, clinicians must check a patient's past medical history before proceeding with treatment. In this work we present a machine learning approach for accurate automatic detection of this information in unstructured text documents such as discharge letters or referral letters, to support the clinician in making a decision about whether to administer thrombolysis.
We consulted local and national guidelines for thrombolysis eligibility, identifying 86 entities which are relevant to the thrombolysis decision. A total of 8,067 documents from 2,912 patients were manually annotated with these entities by medical students and clinicians. Using this data, we trained and validated several transformer-based named entity recognition (NER) models, focusing on transformer models which have been pre-trained on a biomedical corpus as these have shown most promise in the biomedical NER literature.
Our best model was a PubMedBERT-based approach, which obtained a lenient micro/macro F1 score of 0.829/0.723. Ensembling 5 variants of this model gave a significant boost to precision, obtaining micro/macro F1 of 0.846/0.734 which approaches the human annotator performance of 0.847/0.839. We further propose numeric definitions for the concepts of name regularity (similarity of all spans which refer to an entity) and context regularity (similarity of all context surrounding mentions of an entity), using these to analyse the types of errors made by the system and finding that the name regularity of an entity is a stronger predictor of model performance than raw training set frequency.
Overall, this work shows the potential of machine learning to provide clinical decision support (CDS) for the time-critical decision of thrombolysis administration in ischaemic stroke by quickly surfacing relevant information, leading to prompt treatment and hence to better patient outcomes.
急性缺血性中风的溶栓治疗如果足够早地进行,可能会带来更好的治疗效果。然而,存在一些禁忌症,会使患者面临更高的出血风险(例如近期的大手术、抗凝药物治疗)。因此,临床医生在进行治疗前必须检查患者的既往病史。在这项工作中,我们提出了一种机器学习方法,用于在出院小结或转诊信等非结构化文本文件中准确自动检测此类信息,以支持临床医生决定是否进行溶栓治疗。
我们参考了当地和国家关于溶栓资格的指南,确定了86个与溶栓决策相关的实体。2912名患者的总共8067份文件由医学生和临床医生使用这些实体进行了人工标注。利用这些数据,我们训练并验证了几个基于Transformer的命名实体识别(NER)模型,重点关注在生物医学语料库上进行过预训练的Transformer模型,因为这些模型在生物医学NER文献中显示出了最大的潜力。
我们最好的模型是基于PubMedBERT的方法,其宽松的微观/宏观F1分数为0.829/0.723。将该模型的5个变体进行集成显著提高了精确率,微观/宏观F1分数达到0.846/0.734,接近人类标注者的0.847/0.839的性能。我们进一步为名称规律性(指代一个实体的所有跨度的相似性)和上下文规律性(实体提及周围所有上下文的相似性)的概念提出了数值定义,用这些来分析系统所犯错误的类型,并发现实体的名称规律性比原始训练集频率更能预测模型性能。
总体而言,这项工作展示了机器学习通过快速呈现相关信息为缺血性中风溶栓治疗这一关键时间决策提供临床决策支持(CDS)的潜力,从而实现及时治疗并因此带来更好的患者治疗效果。