Department of Biomedical Informatics, Columbia University, New York, NY, USA.
BMC Med Inform Decis Mak. 2014 Jun 11;14:51. doi: 10.1186/1472-6947-14-51.
To demonstrate that subject selection based on sufficient laboratory results and medication orders in electronic health records can be biased towards sick patients.
Using electronic health record data from 10,000 patients who received anesthetic services at a major metropolitan tertiary care academic medical center, an affiliated hospital for women and children, and an affiliated urban primary care hospital, the correlation between patient health status and counts of days with laboratory results or medication orders, as indicated by the American Society of Anesthesiologists Physical Status Classification (ASA Class), was assessed with a Negative Binomial Regression model.
Higher ASA Class was associated with more points of data: compared to ASA Class 1 patients, ASA Class 4 patients had 5.05 times the number of days with laboratory results and 6.85 times the number of days with medication orders, controlling for age, sex, emergency status, admission type, primary diagnosis, and procedure.
Imposing data sufficiency requirements for subject selection allows researchers to minimize missing data when reusing electronic health records for research, but introduces a bias towards the selection of sicker patients. We demonstrated the relationship between patient health and quantity of data, which may result in a systematic bias towards the selection of sicker patients for research studies and limit the external validity of research conducted using electronic health record data. Additionally, we discovered other variables (i.e., admission status, age, emergency classification, procedure, and diagnosis) that independently affect data sufficiency.
证明基于电子病历中的充分实验室结果和用药医嘱选择研究对象可能存在偏倚,偏向于病情较重的患者。
利用一家大型都会区三级保健学术医疗中心、一家妇幼附属医院和一家城市基层保健医院的 10000 名接受麻醉服务患者的电子健康记录数据,采用负二项回归模型评估患者健康状况与美国麻醉医师协会身体状况分级(ASA 分级)所示的实验室结果或用药医嘱天数之间的相关性。
ASA 分级越高,数据点越多:与 ASA 分级 1 的患者相比,ASA 分级 4 的患者的实验室结果天数和用药医嘱天数分别多 5.05 倍和 6.85 倍,控制了年龄、性别、急诊状态、入院类型、主要诊断和手术。
对研究对象的选择提出数据充足性要求,可以使研究人员在重新使用电子病历进行研究时最大程度地减少数据缺失,但会引入选择病情较重患者的偏倚。我们证明了患者健康状况与数据量之间的关系,这可能导致对研究中选择病情较重患者的系统性偏倚,并限制使用电子健康记录数据进行的研究的外部有效性。此外,我们还发现了其他变量(即入院状态、年龄、急诊分类、手术和诊断)会独立影响数据充足性。