Centre for Health Informatics (Martin, D'Souza, Lee, Eastwood, Quan) and Department of Community Health Sciences (Eastwood, Quan), Cumming School of Medicine, University of Calgary; Alberta Health Services (Martin, D'Souza, Lee), Calgary, Alta.; Department of Medicine, Faculty of Medicine and Dentistry, University of Alberta, Edmonton, Alta
Centre for Health Informatics (Martin, D'Souza, Lee, Eastwood, Quan) and Department of Community Health Sciences (Eastwood, Quan), Cumming School of Medicine, University of Calgary; Alberta Health Services (Martin, D'Souza, Lee), Calgary, Alta.; Department of Medicine, Faculty of Medicine and Dentistry, University of Alberta, Edmonton, Alta.
CMAJ Open. 2023 Feb 14;11(1):E131-E139. doi: 10.9778/cmajo.20210170. Print 2023 Jan-Feb.
Case identification is important for health services research, measuring health system performance and risk adjustment, but existing methods based on manual chart review or diagnosis codes can be expensive, time consuming or of limited validity. We aimed to develop a hypertension case definition in electronic medical records (EMRs) for inpatient clinical notes using machine learning.
A cohort of patients 18 years of age or older who were discharged from 1 of 3 Calgary acute care facilities (1 academic hospital and 2 community hospitals) between Jan. 1 and June 30, 2015, were randomly selected, and we compared the performance of EMR phenotype algorithms developed using machine learning with an algorithm based on the Canadian version of the , (ICD), in identifying patients with hypertension. Hypertension status was determined by chart review, the machine-learning algorithms used EMR notes and the ICD algorithm used the Discharge Abstract Database (Canadian Institute for Health Information).
Of our study sample ( = 3040), 1475 (48.5%) patients had hypertension. The group with hypertension was older (median age of 71.0 yr v. 52.5 yr for those patients without hypertension) and had fewer females (710 [48.2%] v. 764 [52.3%]). Our final EMR-based models had higher sensitivity than the ICD algorithm (> 90% v. 47%), while maintaining high positive predictive values (> 90% v. 97%).
We found that hypertension tends to have clear documentation in EMRs and is well classified by concept search on free text. Machine learning can provide insights into how and where conditions are documented in EMRs and suggest nonmachine-learning phenotypes to implement.
病例识别对于卫生服务研究、衡量卫生系统绩效和风险调整非常重要,但基于人工病历审查或诊断代码的现有方法可能成本高昂、耗时或有效性有限。我们旨在使用机器学习为住院临床记录的电子病历(EMR)开发高血压病例定义。
从 2015 年 1 月 1 日至 6 月 30 日期间从卡尔加里 3 个急症护理机构(1 个学术医院和 2 个社区医院)出院的 18 岁或以上患者中随机选择了一个队列,我们比较了使用机器学习开发的 EMR 表型算法与基于加拿大版的的表现。(国际疾病分类),用于识别高血压患者。高血压状态通过病历审查确定,机器学习算法使用 EMR 记录,ICD 算法使用出院摘要数据库(加拿大健康信息研究所)。
在我们的研究样本(n=3040)中,1475 名(48.5%)患者患有高血压。高血压组年龄较大(中位数年龄为 71.0 岁,而无高血压患者为 52.5 岁),女性较少(710[48.2%]例,764[52.3%]例)。我们的最终基于 EMR 的模型比 ICD 算法具有更高的敏感性(>90%比 47%),同时保持高阳性预测值(>90%比 97%)。
我们发现高血压在 EMR 中有明确的记录,并且通过对自由文本的概念搜索可以很好地分类。机器学习可以深入了解条件在 EMR 中的记录方式和位置,并提出要实现的非机器学习表型。