Suppr超能文献

从癫痫诊所记录中提取癫痫发作频率:一种自然语言处理的机器阅读方法。

Extracting seizure frequency from epilepsy clinic notes: a machine reading approach to natural language processing.

机构信息

Department of Bioengineering, School of Engineering and Applied Sciences, University of Pennsylvania, Philadelphia, Pennsylvania, USA.

Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA.

出版信息

J Am Med Inform Assoc. 2022 Apr 13;29(5):873-881. doi: 10.1093/jamia/ocac018.

Abstract

OBJECTIVE

Seizure frequency and seizure freedom are among the most important outcome measures for patients with epilepsy. In this study, we aimed to automatically extract this clinical information from unstructured text in clinical notes. If successful, this could improve clinical decision-making in epilepsy patients and allow for rapid, large-scale retrospective research.

MATERIALS AND METHODS

We developed a finetuning pipeline for pretrained neural models to classify patients as being seizure-free and to extract text containing their seizure frequency and date of last seizure from clinical notes. We annotated 1000 notes for use as training and testing data and determined how well 3 pretrained neural models, BERT, RoBERTa, and Bio_ClinicalBERT, could identify and extract the desired information after finetuning.

RESULTS

The finetuned models (BERTFT, Bio_ClinicalBERTFT, and RoBERTaFT) achieved near-human performance when classifying patients as seizure free, with BERTFT and Bio_ClinicalBERTFT achieving accuracy scores over 80%. All 3 models also achieved human performance when extracting seizure frequency and date of last seizure, with overall F1 scores over 0.80. The best combination of models was Bio_ClinicalBERTFT for classification, and RoBERTaFT for text extraction. Most of the gains in performance due to finetuning required roughly 70 annotated notes.

DISCUSSION AND CONCLUSION

Our novel machine reading approach to extracting important clinical outcomes performed at or near human performance on several tasks. This approach opens new possibilities to support clinical practice and conduct large-scale retrospective clinical research. Future studies can use our finetuning pipeline with minimal training annotations to answer new clinical questions.

摘要

目的

癫痫患者最重要的结局指标之一是发作频率和无发作。本研究旨在从临床记录中的非结构化文本中自动提取这些临床信息。如果成功,这可以改善癫痫患者的临床决策,并允许快速进行大规模回顾性研究。

材料和方法

我们开发了一个针对预训练神经模型的微调管道,以将患者分类为无发作,并从临床记录中提取包含其发作频率和最后一次发作日期的文本。我们对 1000 份注释用于训练和测试数据,并确定了 3 种预训练神经模型(BERT、RoBERTa 和 Bio_ClinicalBERT)在微调后识别和提取所需信息的能力。

结果

微调模型(BERTFT、Bio_ClinicalBERTFT 和 RoBERTaFT)在将患者分类为无发作时达到了接近人类的性能,其中 BERTFT 和 Bio_ClinicalBERTFT 的准确率超过 80%。所有 3 种模型在提取发作频率和最后一次发作日期时也达到了人类的性能,整体 F1 得分超过 0.80。分类的最佳模型组合是 Bio_ClinicalBERTFT,而文本提取的最佳模型组合是 RoBERTaFT。由于微调而导致的性能提升大部分需要大约 70 个注释。

讨论与结论

我们提出的用于提取重要临床结果的新型机器阅读方法在多个任务上达到或接近人类水平。这种方法为支持临床实践和进行大规模回顾性临床研究开辟了新的可能性。未来的研究可以使用我们的微调管道进行最少的训练注释,以回答新的临床问题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a94/9006692/13f792c89ab6/ocac018f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验