Department of Pediatrics, University of Texas Health Science Center at Houston, Houston, TX, USA.
Department of Pediatrics, Division of Neonatal-Perinatal Medicine, University of Texas Southwestern Medical Center, Dallas, TX, USA.
J Perinatol. 2019 Mar;39(3):468-474. doi: 10.1038/s41372-018-0311-8. Epub 2019 Jan 24.
Determine sources of error in electronically extracted data from electronic health records.
Categorical and continuous variables related to early-onset neonatal hypoglycemia were preselected and electronically extracted from records of 100 randomly selected neonates within 3479 births with laboratory-proven early-onset hypoglycemia. Extraction language was written by an information technologist and data validated by blinded manual chart review. Kappa coefficient assessed categorical variables and percent validity continuous variables.
8/23 (35%) categorical variables had acceptable Κappa (1-0.81); 5/23 (22%) had fair-slight agreement, Κappa < 0.40. Notably, "hypoglycemia" had poor agreement, Κappa 0.16. In contrast, 6/8 continuous variables had validity ≥ 94%. After correcting extraction language, 6/9 variables were corrected and inter-rater validation improved. However, "hypoglycemia" was not corrected, remaining an issue.
Data extraction without validation procedures, especially categorical variables using International Classification of Diseases-9 (ICD-9) codes, often results in incorrect data identification. Electronically extracted data must incorporate built-in validating processes.
确定电子健康记录中电子提取数据的误差源。
选择与早发性新生儿低血糖相关的分类变量和连续变量,并从 3479 例经实验室证实的早发性低血糖出生的 100 例随机新生儿记录中电子提取。提取语言由信息技术专家编写,数据通过盲法手工图表审查进行验证。Kappa 系数评估分类变量,百分比有效性评估连续变量。
23 个分类变量中有 8 个(35%)具有可接受的 Kappa(1-0.81);5 个(22%)具有一般到轻微一致性,Kappa<0.40。值得注意的是,“低血糖”一致性差,Kappa 为 0.16。相比之下,8 个连续变量中有 6 个(75%)有效性≥94%。在纠正提取语言后,9 个变量中有 6 个得到纠正,并且重新评估者之间的一致性得到提高。然而,“低血糖”没有得到纠正,仍然是一个问题。
没有验证程序的数据提取,特别是使用国际疾病分类第 9 版(ICD-9)代码的分类变量,通常会导致数据识别错误。电子提取的数据必须包含内置的验证过程。