Singh Karandeep, Betensky Rebecca A, Wright Adam, Curhan Gary C, Bates David W, Waikar Sushrut S
Division of Learning and Knowledge Systems, Department of Learning Health Sciences and.
Division of Nephrology, Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, Michigan; Departments of.
Clin J Am Soc Nephrol. 2016 Dec 7;11(12):2150-2158. doi: 10.2215/CJN.02420316. Epub 2016 Oct 10.
Identifying predictors of kidney disease progression is critical toward the development of strategies to prevent kidney failure. Clinical notes provide a unique opportunity for big data approaches to identify novel risk factors for disease.
DESIGN, SETTING, PARTICIPANTS, & MEASUREMENTS: We used natural language processing tools to extract concepts from the preceding year's clinical notes among patients newly referred to a tertiary care center's outpatient nephrology clinics and retrospectively evaluated these concepts as predictors for the subsequent development of ESRD using proportional subdistribution hazards (competing risk) regression. The primary outcome was time to ESRD, accounting for a competing risk of death. We identified predictors from univariate and multivariate (adjusting for Tangri linear predictor) models using a 5% threshold for false discovery rate (q value <0.05). We included all patients seen by an adult outpatient nephrologist between January 1, 2004 and June 18, 2014 and excluded patients seen only by transplant nephrology, with preexisting ESRD, with fewer than five clinical notes, with no follow-up, or with no baseline creatinine values.
Among the 4013 patients selected in the final study cohort, we identified 960 concepts in the unadjusted analysis and 885 concepts in the adjusted analysis. Novel predictors identified included high-dose ascorbic acid (adjusted hazard ratio, 5.48; 95% confidence interval, 2.80 to 10.70; q<0.001) and fast food (adjusted hazard ratio, 4.34; 95% confidence interval, 2.55 to 7.40; q<0.001).
Novel predictors of human disease may be identified using an unbiased approach to analyze text from the electronic health record.
识别肾脏疾病进展的预测因素对于制定预防肾衰竭的策略至关重要。临床记录为大数据方法识别疾病新风险因素提供了独特机会。
设计、地点、参与者及测量方法:我们使用自然语言处理工具从新转诊至三级医疗中心门诊肾病科的患者前一年临床记录中提取概念,并使用比例子分布风险(竞争风险)回归对这些概念进行回顾性评估,以预测随后发生终末期肾病(ESRD)的情况。主要结局是发生ESRD的时间,并考虑死亡这一竞争风险。我们使用5%的错误发现率阈值(q值<0.05),从单变量和多变量(校正Tangri线性预测因子)模型中识别预测因素。我们纳入了2004年1月1日至2014年6月18日期间成年门诊肾病科诊治的所有患者,并排除了仅由移植肾病科诊治的患者、已有ESRD的患者、临床记录少于5份的患者、无随访的患者或无基线肌酐值的患者。
在最终研究队列中选取的4013例患者中,未校正分析中识别出960个概念,校正分析中识别出885个概念。识别出的新预测因素包括高剂量抗坏血酸(校正风险比为5.48;95%置信区间为2.80至10.70;q<0.001)和快餐(校正风险比为4.34;95%置信区间为2.55至7.40;q<0.001)。
使用无偏方法分析电子健康记录中的文本,可能识别出人类疾病的新预测因素。