Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, United States.
Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, United States.
J Biomed Inform. 2011 Dec;44 Suppl 1:S63-S68. doi: 10.1016/j.jbi.2011.10.013. Epub 2011 Nov 7.
Cohort identification is an important step in conducting clinical research studies. Use of ICD-9 codes to identify disease cohorts is a common approach that can yield satisfactory results in certain conditions; however, for many use-cases more accurate methods are required. In this study, we propose a bootstrapping method that supplements ICD-9 codes with lab results, medications, etc. to build classification models that can be used to identify cohorts more accurately. The proposed method does not require prior information about the true class of the patients. We used the method to identify Diabetes Mellitus (DM) and Hyperlipidemia (HL) patient cohorts from a database of 800 thousand patients. Evaluation results show that the method identified 11,000 patients who did not have DM related ICD-9 codes as positive for DM and 52,000 patients without HL codes as positive for HL. A review of 400 patient charts (200 patients for each condition) by two clinicians shows that in both the conditions studied, the labeling assigned by the proposed approach is more consistent with that of the clinicians compared to labeling through ICD-9 codes. The method is reasonably automated and, we believe, holds potential for inexpensive, more accurate cohort identification.
队列识别是进行临床研究的重要步骤。使用 ICD-9 代码来识别疾病队列是一种常见的方法,在某些情况下可以得到满意的结果;然而,对于许多用例,需要更准确的方法。在这项研究中,我们提出了一种自举方法,用实验室结果、药物等补充 ICD-9 代码,构建分类模型,从而更准确地识别队列。所提出的方法不需要关于患者真实类别的先验信息。我们使用该方法从一个包含 80 万患者的数据库中识别出糖尿病(DM)和高脂血症(HL)患者队列。评估结果表明,该方法将 11000 名没有 DM 相关 ICD-9 代码的患者标记为 DM 阳性,将 52000 名没有 HL 代码的患者标记为 HL 阳性。两名临床医生对 400 份患者病历(每种情况 200 份)进行了审查,结果表明,在所研究的两种情况下,与通过 ICD-9 代码进行标记相比,所提出的方法所进行的标记与临床医生的标记更为一致。该方法具有一定的自动化程度,我们相信,它具有进行低成本、更准确的队列识别的潜力。