Boland Mary Regina, Tatonetti Nicholas P, Hripcsak George
Department of Biomedical Informatics, Columbia University, New York, NY USA ; Observational Health Data Sciences and Informatics (OHDSI), Columbia University, 622 West 168th Street, PH-20, New York, NY USA.
Department of Biomedical Informatics, Columbia University, New York, NY USA ; Observational Health Data Sciences and Informatics (OHDSI), Columbia University, 622 West 168th Street, PH-20, New York, NY USA ; Department of Systems Biology, Columbia University, New York, NY USA ; Department of Medicine, Columbia University, New York, NY USA.
J Biomed Semantics. 2015 Apr 6;6:14. doi: 10.1186/s13326-015-0010-8. eCollection 2015.
Electronic Health Records (EHRs) contain a wealth of information useful for studying clinical phenotype-genotype relationships. Severity is important for distinguishing among phenotypes; however other severity indices classify patient-level severity (e.g., mild vs. acute dermatitis) rather than phenotype-level severity (e.g., acne vs. myocardial infarction). Phenotype-level severity is independent of the individual patient's state and is relative to other phenotypes. Further, phenotype-level severity does not change based on the individual patient. For example, acne is mild at the phenotype-level and relative to other phenotypes. Therefore, a given patient may have a severe form of acne (this is the patient-level severity), but this does not effect its overall designation as a mild phenotype at the phenotype-level.
We present a method for classifying severity at the phenotype-level that uses the Systemized Nomenclature of Medicine - Clinical Terms. Our method is called the Classification Approach for Extracting Severity Automatically from Electronic Health Records (CAESAR). CAESAR combines multiple severity measures - number of comorbidities, medications, procedures, cost, treatment time, and a proportional index term. CAESAR employs a random forest algorithm and these severity measures to discriminate between severe and mild phenotypes.
Using a random forest algorithm and these severity measures as input, CAESAR differentiates between severe and mild phenotypes (sensitivity = 91.67, specificity = 77.78) when compared to a manually evaluated reference standard (k = 0.716).
CAESAR enables researchers to measure phenotype severity from EHRs to identify phenotypes that are important for comparative effectiveness research.
电子健康记录(EHRs)包含大量有助于研究临床表型 - 基因型关系的信息。严重程度对于区分表型很重要;然而,其他严重程度指数是对患者层面的严重程度进行分类(例如,轻度与急性皮炎),而非表型层面的严重程度(例如,痤疮与心肌梗死)。表型层面的严重程度独立于个体患者的状态,并且相对于其他表型而言。此外,表型层面的严重程度不会因个体患者而改变。例如,痤疮在表型层面是轻度的,并且相对于其他表型也是如此。因此,给定患者可能患有严重形式的痤疮(这是患者层面的严重程度),但这并不影响其在表型层面整体被指定为轻度表型。
我们提出一种在表型层面进行严重程度分类的方法,该方法使用医学系统命名法 - 临床术语。我们的方法称为从电子健康记录中自动提取严重程度的分类方法(CAESAR)。CAESAR结合了多种严重程度度量指标——合并症数量、用药情况、手术操作、费用、治疗时间以及一个比例指数项。CAESAR采用随机森林算法以及这些严重程度度量指标来区分严重和轻度表型。
将随机森林算法和这些严重程度度量指标作为输入,与人工评估的参考标准相比(κ = 0.716),CAESAR能够区分严重和轻度表型(敏感性 = 91.67,特异性 = 77.78)。
CAESAR使研究人员能够从电子健康记录中测量表型严重程度,以识别对比较效果研究重要的表型。