Surveillance Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, MD, USA.
Advanced Computing for Health Sciences, Computing and Computational Sciences Directorate, Oak Ridge National Laboratory, Oak Ridge, TN, USA.
J Natl Cancer Inst Monogr. 2024 Aug 1;2024(65):145-151. doi: 10.1093/jncimonographs/lgae018.
The National Cancer Institute and the Department of Energy strategic partnership applies advanced computing and predictive machine learning and deep learning models to automate the capture of information from unstructured clinical text for inclusion in cancer registries. Applications include extraction of key data elements from pathology reports, determination of whether a pathology or radiology report is related to cancer, extraction of relevant biomarker information, and identification of recurrence. With the growing complexity of cancer diagnosis and treatment, capturing essential information with purely manual methods is increasingly difficult. These new methods for applying advanced computational capabilities to automate data extraction represent an opportunity to close critical information gaps and create a nimble, flexible platform on which new information sources, such as genomics, can be added. This will ultimately provide a deeper understanding of the drivers of cancer and outcomes in the population and increase the timeliness of reporting. These advances will enable better understanding of how real-world patients are treated and the outcomes associated with those treatments in the context of our complex medical and social environment.
美国国立癌症研究所和能源部的战略伙伴关系将应用先进的计算和预测机器学习及深度学习模型,自动从非结构化临床文本中捕获信息,以便将其纳入癌症登记系统。应用包括从病理报告中提取关键数据元素、确定病理或放射学报告是否与癌症相关、提取相关生物标志物信息以及识别复发。随着癌症诊断和治疗的日益复杂,仅采用手动方法来捕获重要信息变得越来越困难。这些将先进计算能力应用于自动化数据提取的新方法为缩小关键信息差距并创建一个灵活的平台提供了机会,新的信息源(如基因组学)可以添加到该平台上。这最终将提供对人群中癌症驱动因素和结果的更深入了解,并提高报告的及时性。这些进展将使我们能够更好地了解现实世界中患者的治疗方法以及在我们复杂的医疗和社会环境中与这些治疗相关的结果。