Pivovarov Rimma, Albers David J, Sepulveda Jorge L, Elhadad Noémie
Department of Biomedical Informatics, Columbia University, 622 W. 168th Street, New York, NY, USA.
Department of Pathology and Cell Biology, Columbia University, 630 W. 168th Street, New York, NY, USA.
J Biomed Inform. 2014 Oct;51:24-34. doi: 10.1016/j.jbi.2014.03.016. Epub 2014 Apr 13.
Electronic health record (EHR) data show promise for deriving new ways of modeling human disease states. Although EHR researchers often use numerical values of laboratory tests as features in disease models, a great deal of information is contained in the context within which a laboratory test is taken. For example, the same numerical value of a creatinine test has different interpretation for a chronic kidney disease patient and a patient with acute kidney injury. We study whether EHR research studies are subject to biased results and interpretations if laboratory measurements taken in different contexts are not explicitly separated. We show that the context of a laboratory test measurement can often be captured by the way the test is measured through time. We perform three tasks to study the properties of these temporal measurement patterns. In the first task, we confirm that laboratory test measurement patterns provide additional information to the stand-alone numerical value. The second task identifies three measurement pattern motifs across a set of 70 laboratory tests performed for over 14,000 patients. Of these, one motif exhibits properties that can lead to biased research results. In the third task, we demonstrate the potential for biased results on a specific example. We conduct an association study of lipase test values to acute pancreatitis. We observe a diluted signal when using only a lipase value threshold, whereas the full association is recovered when properly accounting for lipase measurements in different contexts (leveraging the lipase measurement patterns to separate the contexts). Aggregating EHR data without separating distinct laboratory test measurement patterns can intermix patients with different diseases, leading to the confounding of signals in large-scale EHR analyses. This paper presents a methodology for leveraging measurement frequency to identify and reduce laboratory test biases.
电子健康记录(EHR)数据有望衍生出模拟人类疾病状态的新方法。尽管EHR研究人员经常将实验室检查的数值用作疾病模型的特征,但实验室检查所处的背景中包含大量信息。例如,肌酐检查的相同数值对慢性肾病患者和急性肾损伤患者有不同的解读。我们研究了如果不明确区分在不同背景下进行的实验室测量,EHR研究是否会产生有偏差的结果和解读。我们表明,实验室检查测量的背景通常可以通过检查随时间的测量方式来捕捉。我们执行三项任务来研究这些时间测量模式的特性。在第一项任务中,我们确认实验室检查测量模式为单独的数值提供了额外信息。第二项任务在对超过14000名患者进行的70项实验室检查中识别出三种测量模式基序。其中,一种基序表现出可能导致有偏差研究结果的特性。在第三项任务中,我们通过一个具体例子展示了产生有偏差结果的可能性。我们对脂肪酶检查值与急性胰腺炎进行关联研究。当仅使用脂肪酶值阈值时,我们观察到信号被稀释,而在适当考虑不同背景下的脂肪酶测量(利用脂肪酶测量模式来区分背景)时,完整的关联得以恢复。在不区分不同实验室检查测量模式的情况下汇总EHR数据可能会将患有不同疾病的患者混在一起,导致大规模EHR分析中的信号混淆。本文提出了一种利用测量频率来识别和减少实验室检查偏差的方法。