Suppr超能文献

开发数据和分析平台,以在区域癌症中心实现乳腺癌学习型健康系统。

Developing a Data and Analytics Platform to Enable a Breast Cancer Learning Health System at a Regional Cancer Center.

机构信息

Centre for Data Science and Digital Health, Hamilton Health Sciences, Hamilton, Canada.

Institute for Health Policy Management and Evaluation, Dalla Lana School of Public Health, University of Toronto, Toronto, Canada.

出版信息

JCO Clin Cancer Inform. 2023 Mar;7:e2200182. doi: 10.1200/CCI.22.00182.

Abstract

PURPOSE

This study documents the creation of automated, longitudinal, and prospective data and analytics platform for breast cancer at a regional cancer center. This platform combines principles of data warehousing with natural language processing (NLP) to provide the integrated, timely, meaningful, high-quality, and actionable data required to establish a learning health system.

METHODS

Data from six hospital information systems and one external data source were integrated on a nightly basis by automated extract/transform/load jobs. Free-text clinical documentation was processed using a commercial NLP engine.

RESULTS

The platform contains 141 data elements of 7,019 patients with newly diagnosed breast cancer who received care at our regional cancer center from January 1, 2014, to June 3, 2022. Daily updating of the database takes an average of 56 minutes. Evaluation of the tuning of NLP jobs found overall high performance, with an F1 of 1.0 for 19 variables, with a further 16 variables with an F1 of > 0.95.

CONCLUSION

This study describes how data warehousing combined with NLP can be used to create a prospective data and analytics platform to enable a learning health system. Although upfront time investment required to create the platform was considerable, now that it has been developed, daily data processing is completed automatically in less than an hour.

摘要

目的

本研究记录了在区域癌症中心创建用于乳腺癌的自动化、纵向和前瞻性数据和分析平台。该平台将数据仓库原则与自然语言处理(NLP)相结合,提供建立学习型医疗体系所需的集成、及时、有意义、高质量和可操作的数据。

方法

通过自动提取/转换/加载作业,每晚将来自六个医院信息系统和一个外部数据源的数据进行整合。使用商业 NLP 引擎处理自由文本临床文档。

结果

该平台包含了 141 个数据元素,涉及 7019 名 2014 年 1 月 1 日至 2022 年 6 月 3 日在我们区域癌症中心接受治疗的新诊断乳腺癌患者。数据库的每日更新平均需要 56 分钟。对 NLP 作业调优的评估发现整体性能很高,19 个变量的 F1 值为 1.0,另有 16 个变量的 F1 值大于 0.95。

结论

本研究描述了如何将数据仓库与 NLP 结合使用来创建前瞻性数据和分析平台,以实现学习型医疗体系。尽管创建该平台需要大量的前期投资,但现在已经开发完成,每天的数据处理不到一个小时即可自动完成。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54a5/10281330/25dff634c074/cci-7-e2200182-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验