Department of Electrical Engineering and Computer Science, CSAIL, MIT, Cambridge, Massachusetts.
Department of Surgical Oncology, Massachusetts General Hospital, Boston, Massachusetts.
J Pain Symptom Manage. 2018 Jun;55(6):1492-1499. doi: 10.1016/j.jpainsymman.2018.02.016. Epub 2018 Feb 27.
Clinicians document cancer patients' symptoms in free-text format within electronic health record visit notes. Although symptoms are critically important to quality of life and often herald clinical status changes, computational methods to assess the trajectory of symptoms over time are woefully underdeveloped.
To create machine learning algorithms capable of extracting patient-reported symptoms from free-text electronic health record notes.
The data set included 103,564 sentences obtained from the electronic clinical notes of 2695 breast cancer patients receiving paclitaxel-containing chemotherapy at two academic cancer centers between May 1996 and May 2015. We manually annotated 10,000 sentences and trained a conditional random field model to predict words indicating an active symptom (positive label), absence of a symptom (negative label), or no symptom at all (neutral label). Sentences labeled by human coder were divided into training, validation, and test data sets. Final model performance was determined on 20% test data unused in model development or tuning.
The final model achieved precision of 0.82, 0.86, and 0.99 and recall of 0.56, 0.69, and 1.00 for positive, negative, and neutral symptom labels, respectively. The most common positive symptoms were pain, fatigue, and nausea. Machine-based labeling of 103,564 sentences took two minutes.
We demonstrate the potential of machine learning to gather, track, and analyze symptoms experienced by cancer patients during chemotherapy. Although our initial model requires further optimization to improve the performance, further model building may yield machine learning methods suitable to be deployed in routine clinical care, quality improvement, and research applications.
临床医生以电子病历就诊记录中的自由文本格式记录癌症患者的症状。尽管症状对生活质量至关重要,并且常常预示着临床状况的变化,但评估症状随时间推移的轨迹的计算方法却非常不完善。
创建能够从电子病历自由文本记录中提取患者报告症状的机器学习算法。
该数据集包含来自 2015 年 5 月至 2015 年 5 月期间在两个学术癌症中心接受紫杉醇类化疗的 2695 例乳腺癌患者的电子临床记录中的 103564 个句子。我们手动注释了 10000 个句子,并训练了一个条件随机场模型来预测表示活跃症状(阳性标签)、无症状(阴性标签)或根本无症状(中性标签)的单词。由人类编码员标记的句子分为训练、验证和测试数据集。最终模型性能是在未用于模型开发或调整的 20%测试数据上确定的。
最终模型在阳性、阴性和中性症状标签上的精度分别为 0.82、0.86 和 0.99,召回率分别为 0.56、0.69 和 1.00。最常见的阳性症状是疼痛、疲劳和恶心。对 103564 个句子进行基于机器的标记需要两分钟。
我们证明了机器学习在收集、跟踪和分析癌症患者化疗期间经历的症状方面的潜力。虽然我们的初始模型需要进一步优化以提高性能,但进一步的模型构建可能会产生适合在常规临床护理、质量改进和研究应用中部署的机器学习方法。