Lin Chen, Karlson Elizabeth W, Dligach Dmitriy, Ramirez Monica P, Miller Timothy A, Mo Huan, Braggs Natalie S, Cagan Andrew, Gainer Vivian, Denny Joshua C, Savova Guergana K
Boston Children's Hospital, Informatics Program, Boston, Massachusetts, USA *CL, EWK and DD are co-first authors.
Division of Rheumatology, Immunology and Allergy, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts, USA Harvard Medical School, Boston, Massachusetts, USA *CL, EWK and DD are co-first authors.
J Am Med Inform Assoc. 2015 Apr;22(e1):e151-61. doi: 10.1136/amiajnl-2014-002642. Epub 2014 Oct 25.
To improve the accuracy of mining structured and unstructured components of the electronic medical record (EMR) by adding temporal features to automatically identify patients with rheumatoid arthritis (RA) with methotrexate-induced liver transaminase abnormalities.
Codified information and a string-matching algorithm were applied to a RA cohort of 5903 patients from Partners HealthCare to select 1130 patients with potential liver toxicity. Supervised machine learning was applied as our key method. For features, Apache clinical Text Analysis and Knowledge Extraction System (cTAKES) was used to extract standard vocabulary from relevant sections of the unstructured clinical narrative. Temporal features were further extracted to assess the temporal relevance of event mentions with regard to the date of transaminase abnormality. All features were encapsulated in a 3-month-long episode for classification. Results were summarized at patient level in a training set (N=480 patients) and evaluated against a test set (N=120 patients).
The system achieved positive predictive value (PPV) 0.756, sensitivity 0.919, F1 score 0.829 on the test set, which was significantly better than the best baseline system (PPV 0.590, sensitivity 0.703, F1 score 0.642). Our innovations, which included framing the phenotype problem as an episode-level classification task, and adding temporal information, all proved highly effective.
Automated methotrexate-induced liver toxicity phenotype discovery for patients with RA based on structured and unstructured information in the EMR shows accurate results. Our work demonstrates that adding temporal features significantly improved classification results.
通过添加时间特征来提高挖掘电子病历(EMR)结构化和非结构化成分的准确性,以自动识别患有甲氨蝶呤诱导的肝转氨酶异常的类风湿性关节炎(RA)患者。
将编码信息和字符串匹配算法应用于来自合作伙伴医疗保健公司的5903例RA患者队列,以选择1130例有潜在肝毒性的患者。应用监督式机器学习作为我们的关键方法。对于特征,使用Apache临床文本分析和知识提取系统(cTAKES)从非结构化临床叙述的相关部分提取标准词汇。进一步提取时间特征,以评估事件提及与转氨酶异常日期的时间相关性。所有特征都封装在一个为期3个月的时间段内进行分类。在训练集(N = 480例患者)中按患者水平汇总结果,并在测试集(N = 120例患者)上进行评估。
该系统在测试集上的阳性预测值(PPV)为0.756,灵敏度为0.919,F1分数为0.829,明显优于最佳基线系统(PPV为0.590,灵敏度为0.703,F1分数为0.642)。我们的创新,包括将表型问题构建为一个时间段级别的分类任务,以及添加时间信息,都证明是非常有效的。
基于EMR中的结构化和非结构化信息,对RA患者进行甲氨蝶呤诱导的肝毒性表型自动发现显示出准确的结果。我们的工作表明,添加时间特征显著改善了分类结果。