Suppr超能文献

将甲状腺癌诊断和分期信息从非结构化报告转化为观察性医疗结局伙伴关系通用数据模型。

Transforming Thyroid Cancer Diagnosis and Staging Information from Unstructured Reports to the Observational Medical Outcome Partnership Common Data Model.

机构信息

Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea.

Department of Nuclear Medicine, Seoul National University, College of Medicine, Seoul, South Korea.

出版信息

Appl Clin Inform. 2022 May;13(3):521-531. doi: 10.1055/s-0042-1748144. Epub 2022 Jun 15.

Abstract

BACKGROUND

Cancer staging information is an essential component of cancer research. However, the information is primarily stored as either a full or semistructured free-text clinical document which is limiting the data use. By transforming the cancer-specific data to the Observational Medical Outcome Partnership Common Data Model (OMOP CDM), the information can contribute to establish multicenter observational cancer studies. To the best of our knowledge, there have been no studies on OMOP CDM transformation and natural language processing (NLP) for thyroid cancer to date.

OBJECTIVE

We aimed to demonstrate the applicability of the OMOP CDM oncology extension module for thyroid cancer diagnosis and cancer stage information by processing free-text medical reports.

METHODS

Thyroid cancer diagnosis and stage-related modifiers were extracted with rule-based NLP from 63,795 thyroid cancer pathology reports and 56,239 Iodine whole-body scan reports from three medical institutions in the Observational Health Data Sciences and Informatics data network. The data were converted into the OMOP CDM v6.0 according to the OMOP CDM oncology extension module. The cancer staging group was derived and populated using the transformed CDM data.

RESULTS

The extracted thyroid cancer data were completely converted into the OMOP CDM. The distributions of histopathological types of thyroid cancer were approximately 95.3 to 98.8% of papillary carcinoma, 0.9 to 3.7% of follicular carcinoma, 0.04 to 0.54% of adenocarcinoma, 0.17 to 0.81% of medullary carcinoma, and 0 to 0.3% of anaplastic carcinoma. Regarding cancer staging, stage-I thyroid cancer accounted for 55 to 64% of the cases, while stage III accounted for 24 to 26% of the cases. Stage-II and -IV thyroid cancers were detected at a low rate of 2 to 6%.

CONCLUSION

As a first study on OMOP CDM transformation and NLP for thyroid cancer, this study will help other institutions to standardize thyroid cancer-specific data for retrospective observational research and participate in multicenter studies.

摘要

背景

癌症分期信息是癌症研究的重要组成部分。然而,这些信息主要存储为完整或半结构化的自由文本临床文档,限制了数据的使用。通过将癌症特定数据转换为观察医疗结局伙伴关系通用数据模型(OMOP CDM),这些信息可以为建立多中心观察性癌症研究做出贡献。据我们所知,目前还没有关于甲状腺癌的 OMOP CDM 转换和自然语言处理(NLP)的研究。

目的

通过处理自由文本医疗报告,展示 OMOP CDM 肿瘤学扩展模块在甲状腺癌诊断和癌症分期信息中的适用性。

方法

从三个医疗机构的 Observational Health Data Sciences and Informatics 数据网络中的 63795 份甲状腺癌病理报告和 56239 份碘全身扫描报告中,使用基于规则的 NLP 提取甲状腺癌诊断和与癌症分期相关的修饰词。根据 OMOP CDM 肿瘤学扩展模块,将数据转换为 OMOP CDM v6.0。使用转换后的 CDM 数据衍生和填充癌症分期组。

结果

提取的甲状腺癌数据完全转换为 OMOP CDM。甲状腺癌的组织病理学类型分布约为 95.3%至 98.8%为乳头状癌,0.9%至 3.7%为滤泡状癌,0.04%至 0.54%为腺癌,0.17%至 0.81%为髓样癌,0%至 0.3%为间变性癌。关于癌症分期,I 期甲状腺癌占病例的 55%至 64%,而 III 期占 24%至 26%。II 期和 IV 期甲状腺癌的检出率较低,为 2%至 6%。

结论

作为第一项关于甲状腺癌的 OMOP CDM 转换和 NLP 的研究,本研究将帮助其他机构规范甲状腺癌特定数据,用于回顾性观察性研究,并参与多中心研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ffc4/9200482/c8fd02098d95/10-1055-s-0042-1748144-i210192ra-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验