Department of Community Health Sciences, University of Calgary, Calgary, Alberta, Canada.
Analytics, Alberta Health Services, Calgary, Alberta, Canada.
BMC Med Inform Decis Mak. 2020 Apr 25;20(1):75. doi: 10.1186/s12911-020-1089-0.
Data quality assessment presents a challenge for research using coded administrative health data. The objective of this study is to develop and validate a set of coding association rules for coded diagnostic data.
We used the Canadian re-abstracted hospital discharge abstract data coded in International Classification of Disease, 10th revision (ICD-10) codes. Association rule mining was conducted on the re-abstracted data in four age groups (0-4, 20-44, 45-64; ≥ 65) to extract ICD-10 coding association rules at the three-digit (category of diagnosis) and four-digit levels (category of diagnosis with etiology, anatomy, or severity). The rules were reviewed by a panel of 5 physicians and 2 classification specialists using a modified Delphi rating process. We proposed and defined the variance and bias to assess data quality using the rules.
After the rule mining process and the panel review, 388 rules at the three-digit level and 275 rules at the four-digit level were developed. Half of the rules were from the age group of ≥65. Rules captured meaningful age-specific clinical associations, with rules at the age group of ≥65 being more complex and comprehensive than other age groups. The variance and bias can identify rules with high bias and variance in Alberta data and provides directions for quality improvement.
A set of ICD-10 data quality rules were developed and validated by a clinical and classification expert panel. The rules can be used as a tool to assess ICD-coded data, enabling the monitoring and comparison of data quality across institutions, provinces, and countries.
数据质量评估对使用编码行政健康数据进行的研究提出了挑战。本研究的目的是开发和验证一组编码诊断数据的编码关联规则。
我们使用加拿大重新摘录的国际疾病分类第 10 版(ICD-10)编码的住院病历摘要数据。在四个年龄组(0-4 岁、20-44 岁、45-64 岁和≥65 岁)对重新摘录的数据进行关联规则挖掘,以提取 ICD-10 编码的三位数(诊断类别)和四位数(诊断类别与病因、解剖或严重程度)编码关联规则。规则由一组 5 名医生和 2 名分类专家使用修改后的 Delphi 评分过程进行审查。我们使用规则提出并定义了方差和偏差来评估数据质量。
经过规则挖掘过程和专家组审查,制定了 388 条三位数规则和 275 条四位数规则。其中一半的规则来自≥65 岁的年龄组。规则捕获了有意义的特定年龄的临床关联,≥65 岁年龄组的规则比其他年龄组更复杂和全面。方差和偏差可以识别艾伯塔省数据中具有高偏差和方差的规则,并为质量改进提供方向。
由临床和分类专家小组开发和验证了一组 ICD-10 数据质量规则。这些规则可用作评估 ICD 编码数据的工具,能够在机构、省和国家之间监测和比较数据质量。