Jonnagaddala Jitendra, Liaw Siaw-Teng, Ray Pradeep, Kumar Manish, Chang Nai-Wen, Dai Hong-Jie
School of Public Health and Community Medicine, University of New South Wales, Australia; Asia-Pacific Ubiquitous Healthcare Research Centre, University of New South Wales, Australia; Prince of Wales Clinical School, University of New South Wales, Australia.
School of Public Health and Community Medicine, University of New South Wales, Australia.
J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S203-S210. doi: 10.1016/j.jbi.2015.08.003. Epub 2015 Aug 28.
Coronary artery disease (CAD) often leads to myocardial infarction, which may be fatal. Risk factors can be used to predict CAD, which may subsequently lead to prevention or early intervention. Patient data such as co-morbidities, medication history, social history and family history are required to determine the risk factors for a disease. However, risk factor data are usually embedded in unstructured clinical narratives if the data is not collected specifically for risk assessment purposes. Clinical text mining can be used to extract data related to risk factors from unstructured clinical notes. This study presents methods to extract Framingham risk factors from unstructured electronic health records using clinical text mining and to calculate 10-year coronary artery disease risk scores in a cohort of diabetic patients. We developed a rule-based system to extract risk factors: age, gender, total cholesterol, HDL-C, blood pressure, diabetes history and smoking history. The results showed that the output from the text mining system was reliable, but there was a significant amount of missing data to calculate the Framingham risk score. A systematic approach for understanding missing data was followed by implementation of imputation strategies. An analysis of the 10-year Framingham risk scores for coronary artery disease in this cohort has shown that the majority of the diabetic patients are at moderate risk of CAD.
冠状动脉疾病(CAD)常导致心肌梗死,这可能是致命的。风险因素可用于预测CAD,进而可能实现预防或早期干预。确定一种疾病的风险因素需要患者数据,如合并症、用药史、社会史和家族史。然而,如果数据不是专门为风险评估目的收集的,风险因素数据通常会嵌入非结构化的临床叙述中。临床文本挖掘可用于从非结构化临床记录中提取与风险因素相关的数据。本研究提出了利用临床文本挖掘从非结构化电子健康记录中提取弗明汉风险因素,并在一组糖尿病患者中计算10年冠状动脉疾病风险评分的方法。我们开发了一个基于规则的系统来提取风险因素:年龄、性别、总胆固醇、高密度脂蛋白胆固醇、血压、糖尿病史和吸烟史。结果表明,文本挖掘系统的输出是可靠的,但计算弗明汉风险评分存在大量缺失数据。在实施插补策略之前,先采用了一种系统的方法来理解缺失数据。对该队列中冠状动脉疾病的10年弗明汉风险评分分析表明,大多数糖尿病患者处于CAD的中度风险。