Cai Ling, DeBerardinis Ralph J, Zhan Xiaowei, Xiao Guanghua, Xie Yang
Quantitative Biomedical Research Center, Peter O'Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX 75390, United States.
Children's Research Institute, University of Texas Southwestern Medical Center, Dallas, TX 75390, United States.
J Am Med Inform Assoc. 2024 Dec 1;31(12):2849-2856. doi: 10.1093/jamia/ocae236.
The increasing reliance on electronic health records (EHRs) for research and clinical care necessitates robust methods for assessing data quality and identifying inconsistencies. To address this need, we develop and apply the incongruence rate (IR) using sex-specific medical conditions. We also characterized participants with incongruent records to better understand the scope and nature of data discrepancies.
In this cross-sectional study, we used the All of Us Research Program's latest version 7 (v7) EHR data to identify prevalent sex-specific conditions and evaluated the occurrence of incongruent cases, quantified as IR.
Among the 92 597 males and 152 551 females with condition occurrence data available from All of Us and sex-conformed gender, we identified 167 prevalent sex-specific conditions. Among the 37 537 biological males and 95 499 biological females with these sex-specific conditions, we detected an overall IR of 0.86%. Attempt to include non-cisgender participants result in inflated overall IR. Additionally, a significant proportion of participants with incongruent conditions also presented with conditions congruent to their biological sex, indicating a mix of accurate and erroneous records. These incongruences were not geographically or temporally isolated, suggesting systematic issues in EHR data integrity.
Our findings call attention to the existence of systemic data incongruences in sex-specific conditions and the need for robust validation checks. Extending IR evaluation to non-cisgender participants or non-sex-based conditions remain a challenge.
The sex condition-specific IR, when applied to adult populations, provides a valuable metric for data quality assessment in EHRs.
在研究和临床护理中对电子健康记录(EHR)的依赖日益增加,因此需要强大的方法来评估数据质量并识别不一致之处。为满足这一需求,我们开发并应用了针对特定性别的医疗状况的不一致率(IR)。我们还对记录不一致的参与者进行了特征描述,以更好地了解数据差异的范围和性质。
在这项横断面研究中,我们使用了“我们所有人”研究计划的最新版本7(v7)电子健康记录数据来识别常见的特定性别状况,并评估不一致病例的发生率,以IR进行量化。
在“我们所有人”项目中可获得疾病发生数据且性别相符的92597名男性和152551名女性中,我们识别出167种常见的特定性别状况。在患有这些特定性别状况的37537名生物学男性和95499名生物学女性中,我们检测到总体IR为0.86%。尝试纳入非顺性别参与者会导致总体IR虚高。此外,很大一部分状况不一致的参与者同时也存在与其生物学性别相符的状况,这表明记录既有准确的也有错误的。这些不一致并非在地理或时间上孤立存在,这表明电子健康记录数据完整性存在系统性问题。
我们的研究结果提醒人们注意特定性别状况下系统性数据不一致的存在以及进行有力验证检查的必要性。将IR评估扩展到非顺性别参与者或非基于性别的状况仍然是一项挑战。
当应用于成年人群体时,特定性别状况的IR为电子健康记录中的数据质量评估提供了一个有价值的指标。