机器学习和深度学习工具在癌症监测数据自动采集方面的应用。

Machine learning and deep learning tools for the automated capture of cancer surveillance data.

机构信息

Surveillance Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, MD, USA.

Advanced Computing for Health Sciences, Computing and Computational Sciences Directorate, Oak Ridge National Laboratory, Oak Ridge, TN, USA.

出版信息

J Natl Cancer Inst Monogr. 2024 Aug 1;2024(65):145-151. doi: 10.1093/jncimonographs/lgae018.

DOI:10.1093/jncimonographs/lgae018

PMID:39102883

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11300011/

Abstract

The National Cancer Institute and the Department of Energy strategic partnership applies advanced computing and predictive machine learning and deep learning models to automate the capture of information from unstructured clinical text for inclusion in cancer registries. Applications include extraction of key data elements from pathology reports, determination of whether a pathology or radiology report is related to cancer, extraction of relevant biomarker information, and identification of recurrence. With the growing complexity of cancer diagnosis and treatment, capturing essential information with purely manual methods is increasingly difficult. These new methods for applying advanced computational capabilities to automate data extraction represent an opportunity to close critical information gaps and create a nimble, flexible platform on which new information sources, such as genomics, can be added. This will ultimately provide a deeper understanding of the drivers of cancer and outcomes in the population and increase the timeliness of reporting. These advances will enable better understanding of how real-world patients are treated and the outcomes associated with those treatments in the context of our complex medical and social environment.

摘要

美国国立癌症研究所和能源部的战略伙伴关系将应用先进的计算和预测机器学习及深度学习模型，自动从非结构化临床文本中捕获信息，以便将其纳入癌症登记系统。应用包括从病理报告中提取关键数据元素、确定病理或放射学报告是否与癌症相关、提取相关生物标志物信息以及识别复发。随着癌症诊断和治疗的日益复杂，仅采用手动方法来捕获重要信息变得越来越困难。这些将先进计算能力应用于自动化数据提取的新方法为缩小关键信息差距并创建一个灵活的平台提供了机会，新的信息源（如基因组学）可以添加到该平台上。这最终将提供对人群中癌症驱动因素和结果的更深入了解，并提高报告的及时性。这些进展将使我们能够更好地了解现实世界中患者的治疗方法以及在我们复杂的医疗和社会环境中与这些治疗相关的结果。

相似文献

Machine learning and deep learning tools for the automated capture of cancer surveillance data.机器学习和深度学习工具在癌症监测数据自动采集方面的应用。

J Natl Cancer Inst Monogr. 2024 Aug 1;2024(65):145-151. doi: 10.1093/jncimonographs/lgae018.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Information extraction from multi-institutional radiology reports.从多机构放射学报告中提取信息。

Artif Intell Med. 2016 Jan;66:29-39. doi: 10.1016/j.artmed.2015.09.007. Epub 2015 Oct 3.

Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks.使用多任务卷积神经网络从自由文本病理报告中自动提取癌症登记报告信息。

J Am Med Inform Assoc. 2020 Jan 1;27(1):89-98. doi: 10.1093/jamia/ocz153.

The future of Cochrane Neonatal.考克兰新生儿协作网的未来。

Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.

Toward real-time reporting of cancer incidence: methodology, pilot study, and SEER Program implementation.实现癌症发病率实时报告：方法学、试点研究和 SEER 计划实施。

J Natl Cancer Inst Monogr. 2024 Aug 1;2024(65):123-131. doi: 10.1093/jncimonographs/lgae024.

An Expandable Informatics Framework for Enhancing Central Cancer Registries with Digital Pathology Specimens, Computational Imaging Tools, and Advanced Mining Capabilities.一个可扩展的信息学框架，用于通过数字病理标本、计算机成像工具和先进挖掘能力增强中央癌症登记系统。

J Pathol Inform. 2022 Jan 5;13:5. doi: 10.4103/jpi.jpi_31_21. eCollection 2022.

The NCI All Ireland Cancer Conference.美国国家癌症研究所全爱尔兰癌症会议。

Oncologist. 1999;4(4):275-277.

Using informatics to improve cancer surveillance.利用信息学改善癌症监测。

J Am Med Inform Assoc. 2020 Jul 1;27(9):1488-1495. doi: 10.1093/jamia/ocaa149.

Rise of the Machines: Advances in Deep Learning for Cancer Diagnosis.机器的崛起：深度学习在癌症诊断中的进展

Trends Cancer. 2019 Mar;5(3):157-169. doi: 10.1016/j.trecan.2019.02.002. Epub 2019 Feb 28.

引用本文的文献

A survey of NLP methods for oncology in the past decade with a focus on cancer registry applications.对过去十年肿瘤学领域自然语言处理方法的一项调查，重点关注癌症登记应用。

Artif Intell Rev. 2025;58(10):314. doi: 10.1007/s10462-025-11316-5. Epub 2025 Jul 16.

Evaluating algorithmic bias on biomarker classification of breast cancer pathology reports.评估算法偏差对乳腺癌病理报告生物标志物分类的影响。

JAMIA Open. 2025 May 9;8(3):ooaf033. doi: 10.1093/jamiaopen/ooaf033. eCollection 2025 Jun.

The SEER Program's evolution: supporting clinically meaningful population-level research.SEER 计划的演变：支持具有临床意义的人群水平研究。

J Natl Cancer Inst Monogr. 2024 Aug 1;2024(65):110-117. doi: 10.1093/jncimonographs/lgae022.

本文引用的文献

Deep Transfer Learning Across Cancer Registries for Information Extraction from Pathology Reports.跨癌症登记处的深度迁移学习用于从病理报告中提取信息

IEEE EMBS Int Conf Biomed Health Inform. 2019 May;2019. doi: 10.1109/bhi.2019.8834586. Epub 2019 Sep 12.

Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing.通过癌症自然语言处理的范围综述评估癌症研究和患者护理的电子健康记录。

JCO Clin Cancer Inform. 2022 Jul;6:e2200006. doi: 10.1200/CCI.22.00006.

Limitations of Transformers on Clinical Text Classification.Transformer 在临床文本分类上的局限性。

IEEE J Biomed Health Inform. 2021 Sep;25(9):3596-3607. doi: 10.1109/JBHI.2021.3062322. Epub 2021 Sep 3.

Using case-level context to classify cancer pathology reports.利用病例级上下文对癌症病理报告进行分类。

PLoS One. 2020 May 12;15(5):e0232840. doi: 10.1371/journal.pone.0232840. eCollection 2020.

Deep Learning for Automated Extraction of Primary Sites From Cancer Pathology Reports.深度学习在癌症病理报告中自动提取原发部位的应用

IEEE J Biomed Health Inform. 2018 Jan;22(1):244-251. doi: 10.1109/JBHI.2017.2700722. Epub 2017 May 3.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验