Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts.
Harvard Medical School, Boston, Massachusetts.
Cancer Res. 2019 Nov 1;79(21):5463-5470. doi: 10.1158/0008-5472.CAN-19-0579. Epub 2019 Aug 8.
Current models for correlating electronic medical records with -omics data largely ignore clinical text, which is an important source of phenotype information for patients with cancer. This data convergence has the potential to reveal new insights about cancer initiation, progression, metastasis, and response to treatment. Insights from this real-world data will catalyze clinical care, research, and regulatory activities. Natural language processing (NLP) methods are needed to extract these rich cancer phenotypes from clinical text. Here, we review the advances of NLP and information extraction methods relevant to oncology based on publications from PubMed as well as NLP and machine learning conference proceedings in the last 3 years. Given the interdisciplinary nature of the fields of oncology and information extraction, this analysis serves as a critical trail marker on the path to higher fidelity oncology phenotypes from real-world data.
目前将电子病历与组学数据相关联的模型在很大程度上忽略了临床文本,而临床文本是癌症患者表型信息的重要来源。这种数据融合有可能揭示关于癌症发生、进展、转移和对治疗反应的新见解。来自这些真实世界数据的见解将促进临床护理、研究和监管活动。需要自然语言处理 (NLP) 方法从临床文本中提取这些丰富的癌症表型。在这里,我们根据过去 3 年在 PubMed 上发表的论文以及 NLP 和机器学习会议记录,回顾了与肿瘤学相关的 NLP 和信息提取方法的进展。鉴于肿瘤学和信息提取领域的跨学科性质,该分析是从真实世界数据中获得更准确肿瘤学表型的关键里程碑。