Suppr超能文献

利用自然语言处理从非结构化临床信件中提取结构化癫痫数据:ExECT(癫痫临床文本提取)系统的开发和验证。

Using natural language processing to extract structured epilepsy data from unstructured clinic letters: development and validation of the ExECT (extraction of epilepsy clinical text) system.

机构信息

Neurology and Molecular Neuroscience Group, Institute of Life Science, Swansea University Medical School, Swansea University, Swansea, UK.

Health Data Research UK, Data Science Building, Swansea University Medical School, Swansea University, Swansea, UK.

出版信息

BMJ Open. 2019 Apr 1;9(4):e023232. doi: 10.1136/bmjopen-2018-023232.

Abstract

OBJECTIVE

Routinely collected healthcare data are a powerful research resource but often lack detailed disease-specific information that is collected in clinical free text, for example, clinic letters. We aim to use natural language processing techniques to extract detailed clinical information from epilepsy clinic letters to enrich routinely collected data.

DESIGN

We used the general architecture for text engineering (GATE) framework to build an information extraction system, ExECT (extraction of epilepsy clinical text), combining rule-based and statistical techniques. We extracted nine categories of epilepsy information in addition to clinic date and date of birth across 200 clinic letters. We compared the results of our algorithm with a manual review of the letters by an epilepsy clinician.

SETTING

De-identified and pseudonymised epilepsy clinic letters from a Health Board serving half a million residents in Wales, UK.

RESULTS

We identified 1925 items of information with overall precision, recall and F1 score of 91.4%, 81.4% and 86.1%, respectively. Precision and recall for epilepsy-specific categories were: epilepsy diagnosis (88.1%, 89.0%), epilepsy type (89.8%, 79.8%), focal seizures (96.2%, 69.7%), generalised seizures (88.8%, 52.3%), seizure frequency (86.3%-53.6%), medication (96.1%, 94.0%), CT (55.6%, 58.8%), MRI (82.4%, 68.8%) and electroencephalogram (81.5%, 75.3%).

CONCLUSIONS

We have built an automated clinical text extraction system that can accurately extract epilepsy information from free text in clinic letters. This can enhance routinely collected data for research in the UK. The information extracted with ExECT such as epilepsy type, seizure frequency and neurological investigations are often missing from routinely collected data. We propose that our algorithm can bridge this data gap enabling further epilepsy research opportunities. While many of the rules in our pipeline were tailored to extract epilepsy specific information, our methods can be applied to other diseases and also can be used in clinical practice to record patient information in a structured manner.

摘要

目的

常规收集的医疗保健数据是一种强大的研究资源,但通常缺乏在临床自由文本中收集的详细疾病特异性信息,例如诊所信件。我们旨在使用自然语言处理技术从癫痫诊所信件中提取详细的临床信息,以丰富常规收集的数据。

设计

我们使用通用文本工程架构(GATE)框架构建了一个信息提取系统,ExECT(癫痫临床文本提取),结合基于规则和统计技术。我们从 200 封诊所信件中提取了除诊所日期和出生日期以外的九个类别的癫痫信息。我们将我们的算法结果与一名癫痫临床医生对信件的手动审查进行了比较。

设置

来自英国威尔士一个为 50 万居民提供服务的卫生委员会的去识别和化名癫痫诊所信件。

结果

我们共确定了 1925 项信息,总体精度、召回率和 F1 得分为 91.4%、81.4%和 86.1%。癫痫特异性类别的精度和召回率分别为:癫痫诊断(88.1%、89.0%)、癫痫类型(89.8%、79.8%)、局灶性发作(96.2%、69.7%)、全身性发作(88.8%、52.3%)、发作频率(86.3%-53.6%)、药物治疗(96.1%、94.0%)、CT(55.6%、58.8%)、MRI(82.4%、68.8%)和脑电图(81.5%、75.3%)。

结论

我们构建了一种自动化临床文本提取系统,能够从诊所信件的自由文本中准确提取癫痫信息。这可以增强英国的常规收集数据进行研究。ExECT 提取的信息,如癫痫类型、发作频率和神经学检查,通常在常规收集的数据中缺失。我们提出我们的算法可以弥补这一数据差距,从而为进一步的癫痫研究提供机会。虽然我们管道中的许多规则都是为了提取癫痫特异性信息而定制的,但我们的方法也可以应用于其他疾病,也可以在临床实践中用于以结构化方式记录患者信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc4b/6500195/ed3eb996dc83/bmjopen-2018-023232f01.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验