Ostropolets Anna, Reich Christian, Ryan Patrick, Shang Ning, Hripcsak George, Weng Chunhua
Columbia University Medical Center, New York, NY, USA; Observational Health Data Sciences and Informatics (OHDSI), New York, NY, USA.
IQVIA, Cambridge, MA, USA; Observational Health Data Sciences and Informatics (OHDSI), New York, NY, USA.
J Biomed Inform. 2020 Feb;102:103363. doi: 10.1016/j.jbi.2019.103363. Epub 2019 Dec 19.
Algorithms for identifying patients of interest from observational data must address missing and inaccurate data and are desired to achieve comparable performance on both administrative claims and electronic health records data. However, administrative claims data do not contain the necessary information to develop accurate algorithms for disorders that require laboratory results, and this omission can result in insensitive diagnostic code-based algorithms. In this paper, we tested our assertion that the performance of a diagnosis code-based algorithm for chronic kidney disorder (CKD) can be improved by adding other codes indirectly related to CKD (e.g., codes for dialysis, kidney transplant, suspicious kidney disorders). Following the best practices from Observational Health Data Sciences and Informatics (OHDSI), we adapted an electronic health record-based gold standard algorithm for CKD and then created algorithms that can be executed on administrative claims data and account for related data quality issues. We externally validated our algorithms on four electronic health record datasets in the OHDSI network. Compared to the algorithm that uses CKD diagnostic codes only, positive predictive value of the algorithms that use additional codes was slightly increased (47.4% vs. 47.9-48.5% respectively). The algorithms adapted from the gold standard algorithm can be used to infer chronic kidney disorder based on administrative claims data. We succeeded in improving the generalizability and consistency of the CKD phenotypes by using data and vocabulary standardized across the OHDSI network, although performance variability across datasets remains. We showed that identifying and addressing coding and data heterogeneity can improve the performance of the algorithms.
从观察数据中识别感兴趣患者的算法必须处理缺失和不准确的数据,并期望在行政索赔数据和电子健康记录数据上都能实现可比的性能。然而,行政索赔数据不包含开发针对需要实验室检查结果的疾病的准确算法所需的必要信息,这种遗漏可能导致基于诊断代码的算法不够敏感。在本文中,我们检验了我们的断言,即通过添加与慢性肾脏病(CKD)间接相关的其他代码(例如透析、肾移植、可疑肾脏疾病的代码),可以提高基于诊断代码的CKD算法的性能。遵循观察性健康数据科学与信息学(OHDSI)的最佳实践,我们改编了一种基于电子健康记录的CKD金标准算法,然后创建了可以在行政索赔数据上执行并考虑相关数据质量问题的算法。我们在OHDSI网络中的四个电子健康记录数据集上对我们的算法进行了外部验证。与仅使用CKD诊断代码的算法相比,使用额外代码的算法的阳性预测值略有提高(分别为47.4%和47.9 - 48.5%)。从金标准算法改编而来的算法可用于基于行政索赔数据推断慢性肾脏病。尽管各数据集之间仍存在性能差异,但通过使用OHDSI网络中标准化的数据和词汇,我们成功提高了CKD表型的可推广性和一致性。我们表明,识别和解决编码及数据异质性可以提高算法的性能。