Department of Pediatrics, School of Medicine, West Virginia University, Morgantown, West Virginia, United States.
West Virginia Clinical and Translational Science Institute, West Virginia University, Morgantown, West Virginia, United States.
Appl Clin Inform. 2021 Jan;12(1):10-16. doi: 10.1055/s-0040-1721012. Epub 2021 Jan 6.
The United States, and especially West Virginia, have a tremendous burden of coronary artery disease (CAD). Undiagnosed familial hypercholesterolemia (FH) is an important factor for CAD in the U.S. Identification of a CAD phenotype is an initial step to find families with FH.
We hypothesized that a CAD phenotype detection algorithm that uses discrete data elements from electronic health records (EHRs) can be validated from EHR information housed in a data repository.
We developed an algorithm to detect a CAD phenotype which searched through discrete data elements, such as diagnosis, problem lists, medical history, billing, and procedure (International Classification of Diseases [ICD]-9/10 and Current Procedural Terminology [CPT]) codes. The algorithm was applied to two cohorts of 500 patients, each with varying characteristics. The second (younger) cohort consisted of parents from a school child screening program. We then determined which patients had CAD by systematic, blinded review of EHRs. Following this, we revised the algorithm by refining the acceptable diagnoses and procedures. We ran the second algorithm on the same cohorts and determined the accuracy of the modification.
CAD phenotype Algorithm I was 89.6% accurate, 94.6% sensitive, and 85.6% specific for group 1. After revising the algorithm (denoted CAD Algorithm II) and applying it to the same groups 1 and 2, sensitivity 98.2%, specificity 87.8%, and accuracy 92.4; accuracy 93% for group 2. Group 1 F1 score was 92.4%. Specific ICD-10 and CPT codes such as "coronary angiography through a vein graft" were more useful than generic terms.
We have created an algorithm, CAD Algorithm II, that detects CAD on a large scale with high accuracy and sensitivity (recall). It has proven useful among varied patient populations. Use of this algorithm can extend to monitor a registry of patients in an EHR and/or to identify a group such as those with likely FH.
美国,尤其是西弗吉尼亚州,患有冠状动脉疾病(CAD)的人数众多。未确诊的家族性高胆固醇血症(FH)是美国 CAD 的一个重要因素。确定 CAD 表型是发现 FH 家族的第一步。
我们假设一种使用电子健康记录(EHR)中的离散数据元素的 CAD 表型检测算法可以从存储在数据存储库中的 EHR 信息中得到验证。
我们开发了一种用于检测 CAD 表型的算法,该算法通过搜索离散数据元素(如诊断、问题列表、病史、计费和程序(国际疾病分类[ICD]-9/10 和当前程序术语[CPT])代码)。该算法应用于两个特征不同的 500 名患者队列。第二个(年轻)队列由学校儿童筛查计划的父母组成。然后,我们通过对 EHR 进行系统的、盲目的审查来确定哪些患者患有 CAD。在此之后,我们通过细化可接受的诊断和程序来修改算法。我们在相同的队列上运行第二个算法,并确定修改的准确性。
CAD 表型算法 I 对第 1 组的准确率为 89.6%,敏感性为 94.6%,特异性为 85.6%。修改算法(称为 CAD 算法 II)并将其应用于相同的第 1 组和第 2 组后,敏感性为 98.2%,特异性为 87.8%,准确率为 92.4%;第 2 组的准确率为 93%。特定的 ICD-10 和 CPT 代码,如“通过静脉移植物进行冠状动脉造影”,比通用术语更有用。
我们创建了一种算法,CAD 算法 II,它可以高精度和高灵敏度(召回率)大规模检测 CAD。它已被证明在各种患者群体中都很有用。该算法的使用可以扩展到监测 EHR 中的患者注册表,和/或识别可能患有 FH 的人群。