Suppr超能文献

利用自然语言处理从乳腺病理报告中提取临床信息的可行性。

The feasibility of using natural language processing to extract clinical information from breast pathology reports.

作者信息

Buckley Julliette M, Coopey Suzanne B, Sharko John, Polubriaginof Fernanda, Drohan Brian, Belli Ahmet K, Kim Elizabeth M H, Garber Judy E, Smith Barbara L, Gadd Michele A, Specht Michelle C, Roche Constance A, Gudewicz Thomas M, Hughes Kevin S

机构信息

Department of Surgical Oncology, Massachusetts General Hospital, Boston, Massachusetts, USA.

出版信息

J Pathol Inform. 2012;3:23. doi: 10.4103/2153-3539.97788. Epub 2012 Jun 30.

Abstract

OBJECTIVE

The opportunity to integrate clinical decision support systems into clinical practice is limited due to the lack of structured, machine readable data in the current format of the electronic health record. Natural language processing has been designed to convert free text into machine readable data. The aim of the current study was to ascertain the feasibility of using natural language processing to extract clinical information from >76,000 breast pathology reports. APPROACH AND PROCEDURE: Breast pathology reports from three institutions were analyzed using natural language processing software (Clearforest, Waltham, MA) to extract information on a variety of pathologic diagnoses of interest. Data tables were created from the extracted information according to date of surgery, side of surgery, and medical record number. The variety of ways in which each diagnosis could be represented was recorded, as a means of demonstrating the complexity of machine interpretation of free text.

RESULTS

There was widespread variation in how pathologists reported common pathologic diagnoses. We report, for example, 124 ways of saying invasive ductal carcinoma and 95 ways of saying invasive lobular carcinoma. There were >4000 ways of saying invasive ductal carcinoma was not present. Natural language processor sensitivity and specificity were 99.1% and 96.5% when compared to expert human coders.

CONCLUSION

We have demonstrated how a large body of free text medical information such as seen in breast pathology reports, can be converted to a machine readable format using natural language processing, and described the inherent complexities of the task.

摘要

目的

由于电子健康记录的当前格式缺乏结构化的、机器可读的数据,将临床决策支持系统整合到临床实践中的机会有限。自然语言处理旨在将自由文本转换为机器可读数据。本研究的目的是确定使用自然语言处理从超过76,000份乳腺病理报告中提取临床信息的可行性。

方法与步骤

使用自然语言处理软件(Clearforest,马萨诸塞州沃尔瑟姆)对来自三个机构的乳腺病理报告进行分析,以提取有关各种感兴趣的病理诊断的信息。根据手术日期、手术侧别和病历号,从提取的信息中创建数据表。记录每种诊断可能的多种表达方式,以此展示对自由文本进行机器解读的复杂性。

结果

病理学家报告常见病理诊断的方式存在广泛差异。例如,我们发现有124种表述浸润性导管癌的方式和95种表述浸润性小叶癌的方式。有超过4000种表述不存在浸润性导管癌的方式。与专业人工编码员相比,自然语言处理器的灵敏度和特异性分别为99.1%和96.5%。

结论

我们已经证明了如何使用自然语言处理将乳腺病理报告中所见的大量自由文本医学信息转换为机器可读格式,并描述了该任务固有的复杂性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ebba/3424662/f168945f36a8/JPI-3-23-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验