Centre for Data Science and Digital Health, Hamilton Health Sciences, Hamilton, Canada.
Institute for Health Policy Management and Evaluation, Dalla Lana School of Public Health, University of Toronto, Toronto, Canada.
JCO Clin Cancer Inform. 2023 Mar;7:e2200182. doi: 10.1200/CCI.22.00182.
This study documents the creation of automated, longitudinal, and prospective data and analytics platform for breast cancer at a regional cancer center. This platform combines principles of data warehousing with natural language processing (NLP) to provide the integrated, timely, meaningful, high-quality, and actionable data required to establish a learning health system.
Data from six hospital information systems and one external data source were integrated on a nightly basis by automated extract/transform/load jobs. Free-text clinical documentation was processed using a commercial NLP engine.
The platform contains 141 data elements of 7,019 patients with newly diagnosed breast cancer who received care at our regional cancer center from January 1, 2014, to June 3, 2022. Daily updating of the database takes an average of 56 minutes. Evaluation of the tuning of NLP jobs found overall high performance, with an F1 of 1.0 for 19 variables, with a further 16 variables with an F1 of > 0.95.
This study describes how data warehousing combined with NLP can be used to create a prospective data and analytics platform to enable a learning health system. Although upfront time investment required to create the platform was considerable, now that it has been developed, daily data processing is completed automatically in less than an hour.
本研究记录了在区域癌症中心创建用于乳腺癌的自动化、纵向和前瞻性数据和分析平台。该平台将数据仓库原则与自然语言处理(NLP)相结合,提供建立学习型医疗体系所需的集成、及时、有意义、高质量和可操作的数据。
通过自动提取/转换/加载作业,每晚将来自六个医院信息系统和一个外部数据源的数据进行整合。使用商业 NLP 引擎处理自由文本临床文档。
该平台包含了 141 个数据元素,涉及 7019 名 2014 年 1 月 1 日至 2022 年 6 月 3 日在我们区域癌症中心接受治疗的新诊断乳腺癌患者。数据库的每日更新平均需要 56 分钟。对 NLP 作业调优的评估发现整体性能很高,19 个变量的 F1 值为 1.0,另有 16 个变量的 F1 值大于 0.95。
本研究描述了如何将数据仓库与 NLP 结合使用来创建前瞻性数据和分析平台,以实现学习型医疗体系。尽管创建该平台需要大量的前期投资,但现在已经开发完成,每天的数据处理不到一个小时即可自动完成。