Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.
Department of Medicine, The Ohio State University, Columbus, Ohio, USA.
J Am Med Inform Assoc. 2022 Jun 14;29(7):1131-1141. doi: 10.1093/jamia/ocac046.
A participant's medical history is important in clinical research and can be captured from electronic health records (EHRs) and self-reported surveys. Both can be incomplete, EHR due to documentation gaps or lack of interoperability and surveys due to recall bias or limited health literacy. This analysis compares medical history collected in the All of Us Research Program through both surveys and EHRs.
The All of Us medical history survey includes self-report questionnaire that asks about diagnoses to over 150 medical conditions organized into 12 disease categories. In each category, we identified the 3 most and least frequent self-reported diagnoses and retrieved their analogues from EHRs. We calculated agreement scores and extracted participant demographic characteristics for each comparison set.
The 4th All of Us dataset release includes data from 314 994 participants; 28.3% of whom completed medical history surveys, and 65.5% of whom had EHR data. Hearing and vision category within the survey had the highest number of responses, but the second lowest positive agreement with the EHR (0.21). The Infectious disease category had the lowest positive agreement (0.12). Cancer conditions had the highest positive agreement (0.45) between the 2 data sources.
Our study quantified the agreement of medical history between 2 sources-EHRs and self-reported surveys. Conditions that are usually undocumented in EHRs had low agreement scores, demonstrating that survey data can supplement EHR data. Disagreement between EHR and survey can help identify possible missing records and guide researchers to adjust for biases.
参与者的病史在临床研究中很重要,可以从电子健康记录(EHR)和自我报告的调查中获取。这两者都可能不完整,EHR 由于记录空白或缺乏互操作性,而调查由于回忆偏差或有限的健康素养。本分析比较了通过调查和 EHR 在“所有美国人研究计划”中收集的病史。
“所有美国人”病史调查包括自我报告问卷,询问 12 种疾病类别中超过 150 种疾病的诊断。在每个类别中,我们确定了 3 个最常见和最不常见的自我报告诊断,并从 EHR 中检索了它们的类似物。我们计算了每个比较集的一致性评分,并提取了参与者的人口统计学特征。
第 4 批“所有美国人”数据集包括来自 314994 名参与者的数据;其中 28.3%完成了病史调查,65.5%有 EHR 数据。调查中的听力和视力类别有最多的回复,但与 EHR 的正一致性最低(0.21)。传染病类别与 EHR 的正一致性最低(0.12)。癌症状况在这两个数据源之间具有最高的正一致性(0.45)。
我们的研究量化了 2 种来源-EHR 和自我报告调查之间的病史一致性。通常在 EHR 中未记录的疾病的一致性评分较低,表明调查数据可以补充 EHR 数据。EHR 和调查之间的差异可以帮助确定可能缺失的记录,并指导研究人员调整偏差。