Suppr超能文献

使用R从关系数据库中提取病理报告数据,并以皮肤黑色素瘤报告的提取为例。

Pathology report data extraction from relational database using R, with extraction from reports on melanoma of skin as an example.

作者信息

Ye Jay J

机构信息

Dahl-Chase Pathology Associates, Bangor, Maine, USA.

出版信息

J Pathol Inform. 2016 Oct 21;7:44. doi: 10.4103/2153-3539.192822. eCollection 2016.

Abstract

BACKGROUND

Different methods have been described for data extraction from pathology reports with varying degrees of success. Here a technique for directly extracting data from relational database is described.

METHODS

Our department uses synoptic reports modified from College of American Pathologists (CAP) Cancer Protocol Templates to report most of our cancer diagnoses. Choosing the melanoma of skin synoptic report as an example, R scripting language extended with RODBC package was used to query the pathology information system database. Reports containing melanoma of skin synoptic report in the past 4 and a half years were retrieved and individual data elements were extracted. Using the retrieved list of the cases, the database was queried a second time to retrieve/extract the lymph node staging information in the subsequent reports from the same patients.

RESULTS

426 synoptic reports corresponding to unique lesions of melanoma of skin were retrieved, and data elements of interest were extracted into an R data frame. The distribution of Breslow depth of melanomas grouped by year is used as an example of intra-report data extraction and analysis. When the new pN staging information was present in the subsequent reports, 82% (77/94) was precisely retrieved (pN0, pN1, pN2 and pN3). Additional 15% (14/94) was retrieved with certain ambiguity (positive or knowing there was an update). The specificity was 100% for both. The relationship between Breslow depth and lymph node status was graphed as an example of lesion-specific multi-report data extraction and analysis.

CONCLUSIONS

R extended with RODBC package is a simple and versatile approach well-suited for the above tasks. The success or failure of the retrieval and extraction depended largely on whether the reports were formatted and whether the contents of the elements were consistently phrased. This approach can be easily modified and adopted for other pathology information systems that use relational database for data management.

摘要

背景

已有多种不同方法用于从病理报告中提取数据,其成功程度各异。本文介绍一种从关系数据库直接提取数据的技术。

方法

我们科室使用从美国病理学家学会(CAP)癌症协议模板修改而来的概要报告来报告大多数癌症诊断。以皮肤黑色素瘤概要报告为例,使用扩展了RODBC包的R脚本语言查询病理信息系统数据库。检索过去4年半内包含皮肤黑色素瘤概要报告的病例,并提取各个数据元素。利用检索到的病例列表,再次查询数据库以从同一患者的后续报告中检索/提取淋巴结分期信息。

结果

检索到426份对应皮肤黑色素瘤独特病变的概要报告,并将感兴趣的数据元素提取到一个R数据框中。以按年份分组的黑色素瘤Breslow深度分布为例进行报告内数据提取和分析。当后续报告中有新的pN分期信息时,82%(77/94)被准确检索到(pN0、pN1、pN2和pN3)。另外15%(14/94)的检索存在一定模糊性(阳性或知道有更新)。两者的特异性均为100%。以Breslow深度与淋巴结状态之间的关系作图为例进行病变特异性多报告数据提取和分析。

结论

扩展了RODBC包的R是一种简单且通用的方法,非常适合上述任务。检索和提取的成败很大程度上取决于报告的格式以及元素内容的表述是否一致。这种方法可以很容易地修改并应用于其他使用关系数据库进行数据管理的病理信息系统。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94d5/5100200/e8c0f5d9c77f/JPI-7-44-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验