Suppr超能文献

在临床决策支持中使用自然语言处理和数据挖掘方法关联乳腺钼靶检查和病理检查结果。

Correlating mammographic and pathologic findings in clinical decision support using natural language processing and data mining methods.

作者信息

Patel Tejal A, Puppala Mamta, Ogunti Richard O, Ensor Joe E, He Tiancheng, Shewale Jitesh B, Ankerst Donna P, Kaklamani Virginia G, Rodriguez Angel A, Wong Stephen T C, Chang Jenny C

机构信息

Houston Methodist Cancer Center, Houston, Texas.

Cancer Research Program, Houston Methodist Research Institute, Houston, Texas.

出版信息

Cancer. 2017 Jan 1;123(1):114-121. doi: 10.1002/cncr.30245. Epub 2016 Aug 29.

Abstract

BACKGROUND

A key challenge to mining electronic health records for mammography research is the preponderance of unstructured narrative text, which strikingly limits usable output. The imaging characteristics of breast cancer subtypes have been described previously, but without standardization of parameters for data mining.

METHODS

The authors searched the enterprise-wide data warehouse at the Houston Methodist Hospital, the Methodist Environment for Translational Enhancement and Outcomes Research (METEOR), for patients with Breast Imaging Reporting and Data System (BI-RADS) category 5 mammogram readings performed between January 2006 and May 2015 and an available pathology report. The authors developed natural language processing (NLP) software algorithms to automatically extract mammographic and pathologic findings from free text mammogram and pathology reports. The correlation between mammographic imaging features and breast cancer subtype was analyzed using one-way analysis of variance and the Fisher exact test.

RESULTS

The NLP algorithm was able to obtain key characteristics for 543 patients who met the inclusion criteria. Patients with estrogen receptor-positive tumors were more likely to have spiculated margins (P = .0008), and those with tumors that overexpressed human epidermal growth factor receptor 2 (HER2) were more likely to have heterogeneous and pleomorphic calcifications (P = .0078 and P = .0002, respectively).

CONCLUSIONS

Mammographic imaging characteristics, obtained from an automated text search and the extraction of mammogram reports using NLP techniques, correlated with pathologic breast cancer subtype. The results of the current study validate previously reported trends assessed by manual data collection. Furthermore, NLP provides an automated means with which to scale up data extraction and analysis for clinical decision support. Cancer 2017;114-121. © 2016 American Cancer Society.

摘要

背景

在利用电子健康记录进行乳房X光摄影研究时,一个关键挑战是存在大量非结构化的叙述性文本,这极大地限制了可用输出。先前已描述了乳腺癌亚型的影像学特征,但数据挖掘参数未实现标准化。

方法

作者在休斯顿卫理公会医院的企业级数据仓库——卫理公会转化增强与结果研究环境(METEOR)中,搜索了2006年1月至2015年5月期间进行乳房影像报告和数据系统(BI-RADS)5类乳房X光摄影读数且有可用病理报告的患者。作者开发了自然语言处理(NLP)软件算法,以从乳房X光摄影和病理报告的自由文本中自动提取乳房X光摄影和病理结果。使用单因素方差分析和Fisher精确检验分析乳房X光摄影特征与乳腺癌亚型之间的相关性。

结果

NLP算法能够为543名符合纳入标准的患者获取关键特征。雌激素受体阳性肿瘤患者更有可能出现毛刺状边缘(P = .0008),而人表皮生长因子受体2(HER2)过表达肿瘤患者更有可能出现不均匀及多形性钙化(分别为P = .0078和P = .0002)。

结论

通过自动文本搜索和使用NLP技术提取乳房X光摄影报告获得的乳房X光摄影特征与病理乳腺癌亚型相关。本研究结果验证了先前通过手动数据收集评估的趋势。此外NLP提供了一种自动化手段,可扩大数据提取和分析规模以用于临床决策支持。《癌症》2017;114 - 121。© 2016美国癌症协会

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验