Nelson Stuart J, Yin Ying, Trujillo Rivera Eduardo A, Shao Yijun, Ma Phillip, Tuttle Mark S, Garvin Jennifer, Zeng-Treitler Qing
Biomedical Informatics Center, George Washington University, Washington, DC, USA.
Center for Data Science and Outcomes Research, Washington DC VA Medical Center, Washington, DC, USA.
Digit Health. 2024 Oct 29;10:20552076241297056. doi: 10.1177/20552076241297056. eCollection 2024 Jan-Dec.
International Classification of Diseases (ICD) codes recorded in electronic health records (EHRs) are frequently used to create patient cohorts or define phenotypes. Inconsistent assignment of codes may reduce the utility of such cohorts. We assessed the reliability across time and location of the assignment of ICD codes in a US health system at the time of the transition from ICD-9-CM (ICD, 9th Revision, Clinical Modification) to ICD-10-CM (ICD, 10th Revision, Clinical Modification).
Using clusters of equivalent codes derived from the US Centers for Disease Control and Prevention General Equivalence Mapping (GEM) tables, ICD assignments occurring during the ICD-9-CM to ICD-10-CM transition were investigated in EHR data from the US Veterans Administration Central Data Warehouse using deep learning and statistical models. These models were then used to detect abrupt changes across the transition; additionally, changes at each VA station were examined.
Many of the 687 most-used code clusters had ICD-10-CM assignments differing greatly from that predicted from the codes used in ICD-9-CM. Manual reviews of a random sample found that 66% of the clusters showed problematic changes, with 37% having no apparent explanations. Notably, the observed pattern of changes varied widely across care locations.
The observed coding variability across time and across location suggests that ICD codes in EHRs are insufficient to establish a semantically reliable cohort or phenotype. While some variations might be expected with a changing in coding structure, the inconsistency across locations suggests other difficulties. Researchers should consider carefully how cohorts and phenotypes of interest are selected and defined.
电子健康记录(EHR)中记录的国际疾病分类(ICD)代码常用于创建患者队列或定义表型。代码分配不一致可能会降低此类队列的效用。我们评估了在美国医疗系统从ICD-9-CM(国际疾病分类第九版临床修订本)过渡到ICD-10-CM(国际疾病分类第十版临床修订本)时,ICD代码分配在时间和地点上的可靠性。
利用美国疾病控制与预防中心通用等效映射(GEM)表中的等效代码簇,采用深度学习和统计模型,对美国退伍军人事务部中央数据仓库的EHR数据中ICD-9-CM到ICD-10-CM过渡期间的ICD分配情况进行了调查。然后使用这些模型来检测过渡期间的突然变化;此外,还检查了每个退伍军人事务部站点的变化。
687个最常用的代码簇中有许多的ICD-10-CM分配与ICD-9-CM中使用的代码预测结果有很大差异。对随机样本的人工审核发现,66%的代码簇显示出有问题的变化,其中37%没有明显的解释。值得注意的是,观察到的变化模式在不同的护理地点差异很大。
观察到的时间和地点上的编码变异性表明,EHR中的ICD代码不足以建立语义上可靠的队列或表型。虽然随着编码结构的变化可能会出现一些变化,但地点之间的不一致表明存在其他困难。研究人员应仔细考虑如何选择和定义感兴趣的队列和表型。