Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC, United States; Division of Hematology/Oncology, Medical University of South Carolina, Charleston, SC, United States.
Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC, United States.
Int J Med Inform. 2019 Sep;129:13-19. doi: 10.1016/j.ijmedinf.2019.05.018. Epub 2019 May 23.
Insufficient patient enrollment in clinical trials remains a serious and costly problem and is often considered the most critical issue to solve for the clinical trials community. In this project, we assessed the feasibility of automatically detecting a patient's eligibility for a sample of breast cancer clinical trials by mapping coded clinical trial eligibility criteria to the corresponding clinical information automatically extracted from text in the EHR.
Three open breast cancer clinical trials were selected by oncologists. Their eligibility criteria were manually abstracted from trial descriptions using the OHDSI ATLAS web application. Patients enrolled or screened for these trials were selected as 'positive' or 'possible' cases. Other patients diagnosed with breast cancer were selected as 'negative' cases. A selection of the clinical data and all clinical notes of these 229 selected patients was extracted from the MUSC clinical data warehouse and stored in a database implementing the OMOP common data model. Eligibility criteria were extracted from clinical notes using either manually crafted pattern matching (regular expressions) or a new natural language processing (NLP) application. These extracted criteria were then compared with reference criteria from trial descriptions. This comparison was realized with three different versions of a new application: rule-based, cosine similarity-based, and machine learning-based.
For eligibility criteria extraction from clinical notes, the machine learning-based NLP application allowed for the highest accuracy with a micro-averaged recall of 90.9% and precision of 89.7%. For trial eligibility determination, the highest accuracy was reached by the machine learning-based approach with a per-trial AUC between 75.5% and 89.8%.
NLP can be used to extract eligibility criteria from EHR clinical notes and automatically discover patients possibly eligible for a clinical trial with good accuracy, which could be leveraged to reduce the workload of humans screening patients for trials.
临床试验中患者入组不足仍然是一个严重且代价高昂的问题,通常被认为是临床试验界最需要解决的关键问题。在这个项目中,我们评估了通过将编码的临床试验纳入标准映射到从电子健康记录 (EHR) 中的文本中自动提取的相应临床信息,自动检测患者是否符合乳腺癌临床试验纳入标准的可行性。
三位肿瘤学家选择了三个开放的乳腺癌临床试验。他们的纳入标准是使用 OHDSI ATLAS 网络应用程序从试验描述中手动提取的。入选或筛选这些试验的患者被选为“阳性”或“可能”病例。其他被诊断为乳腺癌的患者被选为“阴性”病例。从 MUSC 临床数据仓库中提取了这 229 名选定患者的部分临床数据和所有临床记录,并存储在一个实现 OMOP 通用数据模型的数据库中。使用手动构建的模式匹配(正则表达式)或新的自然语言处理 (NLP) 应用程序从临床记录中提取纳入标准。然后将这些提取的标准与试验描述中的参考标准进行比较。使用新应用程序的三个不同版本实现了这种比较:基于规则的、余弦相似度的和基于机器学习的。
对于从临床记录中提取纳入标准,基于机器学习的 NLP 应用程序的准确率最高,平均召回率为 90.9%,精度为 89.7%。对于试验纳入标准的确定,基于机器学习的方法达到了最高的准确率,每个试验的 AUC 在 75.5%到 89.8%之间。
NLP 可用于从 EHR 临床记录中提取纳入标准,并自动发现可能符合临床试验纳入标准的患者,具有较高的准确性,可用于减轻人工筛选患者参加试验的工作量。