IUPUI, USA.
Indiana University School of Nursing, USA.
Health Informatics J. 2021 Jan-Mar;27(1):14604582211000785. doi: 10.1177/14604582211000785.
This research extracted patient-reported symptoms from free-text EHR notes of colorectal and breast cancer patients and studied the correlation of the symptoms with comorbid type 2 diabetes, race, and smoking status. An NLP framework was developed first to use UMLS MetaMap to extract all symptom terms from the 366,398 EHR clinical notes of 1694 colorectal cancer (CRC) patients and 3458 breast cancer (BC) patients. Semantic analysis and clustering algorithms were then developed to categorize all the relevant symptoms into eight symptom clusters defined by seed terms. After all the relevant symptoms were extracted from the EHR clinical notes, the frequency of the symptoms reported from colorectal cancer (CRC) and breast cancer (BC) patients over three time-periods post-chemotherapy was calculated. Logistic regression (LR) was performed with each symptom cluster as the response variable while controlling for diabetes, race, and smoking status. The results show that the CRC and BC patients with Type 2 Diabetes (T2D) were more likely to report symptoms than CRC and BC without T2D over three time-periods in the cancer trajectory. We also found that current smokers were more likely to report anxiety (CRC, BC), neuropathic symptoms (CRC, BC), anxiety (BC), and depression (BC) than non-smokers.
本研究从结直肠癌和乳腺癌患者的电子病历(EHR)自由文本记录中提取患者报告的症状,并研究这些症状与合并 2 型糖尿病、种族和吸烟状况的相关性。首先开发了一个自然语言处理(NLP)框架,使用 UMLS MetaMap 从 1694 名结直肠癌(CRC)患者和 3458 名乳腺癌(BC)患者的 366398 份 EHR 临床记录中提取所有症状术语。然后开发了语义分析和聚类算法,将所有相关症状归类为八个由种子术语定义的症状簇。从 EHR 临床记录中提取所有相关症状后,计算了化疗后三个时间段内报告的结直肠癌(CRC)和乳腺癌(BC)患者的症状频率。逻辑回归(LR)以每个症状簇作为响应变量进行,同时控制糖尿病、种族和吸烟状况。结果表明,在癌症病程的三个时间段内,患有 2 型糖尿病(T2D)的 CRC 和 BC 患者比没有 T2D 的 CRC 和 BC 患者更有可能报告症状。我们还发现,与不吸烟者相比,吸烟者更有可能报告焦虑(CRC、BC)、神经症状(CRC、BC)、焦虑(BC)和抑郁(BC)。