Konerman Monica A, Zhang Yiwei, Zhu Ji, Higgins Peter D R, Lok Anna S F, Waljee Akbar K
Division of Gastroenterology, Department of Internal Medicine, University of Michigan Health System, Ann Arbor, MI.
VA Ann Arbor Health Services Research and Development Center of Clinical Management Research, Ann Arbor, MI.
Hepatology. 2015 Jun;61(6):1832-41. doi: 10.1002/hep.27750. Epub 2015 Mar 20.
Existing predictive models of risk of disease progression in chronic hepatitis C have limited accuracy. The aim of this study was to improve upon existing models by applying novel statistical methods that incorporate longitudinal data. Patients in the Hepatitis C Antiviral Long-term Treatment Against Cirrhosis trial were analyzed. Outcomes of interest were (1) fibrosis progression (increase of two or more Ishak stages) and (2) liver-related clinical outcomes (liver-related death, hepatic decompensation, hepatocellular carcinoma, liver transplant, or increase in Child-Turcotte-Pugh score to ≥7). Predictors included longitudinal clinical, laboratory, and histologic data. Models were constructed using logistic regression and two machine learning methods (random forest and boosting) to predict an outcome in the next 12 months. The control arm was used as the training data set (n = 349 clinical, n = 184 fibrosis) and the interferon arm, for internal validation. The area under the receiver operating characteristic curve for longitudinal models of fibrosis progression was 0.78 (95% confidence interval [CI] 0.74-0.83) using logistic regression, 0.79 (95% CI 0.77-0.81) using random forest, and 0.79 (95% CI 0.77-0.82) using boosting. The area under the receiver operating characteristic curve for longitudinal models of clinical progression was 0.79 (95% CI 0.77-0.82) using logistic regression, 0.86 (95% CI 0.85-0.87) using random forest, and 0.84 (95% CI 0.82-0.86) using boosting. Longitudinal models outperformed baseline models for both outcomes (P < 0.0001). Longitudinal machine learning models had negative predictive values of 94% for both outcomes.
Prediction models that incorporate longitudinal data can capture nonlinear disease progression in chronic hepatitis C and thus outperform baseline models. Machine learning methods can capture complex relationships between predictors and outcomes, yielding more accurate predictions; our models can help target costly therapies to patients with the most urgent need, guide the intensity of clinical monitoring required, and provide prognostic information to patients.
现有的慢性丙型肝炎疾病进展风险预测模型准确性有限。本研究的目的是通过应用纳入纵向数据的新型统计方法来改进现有模型。对丙型肝炎抗病毒长期治疗对抗肝硬化试验中的患者进行了分析。感兴趣的结局包括:(1)纤维化进展(伊沙克分期增加两个或更多)和(2)肝脏相关临床结局(肝脏相关死亡、肝失代偿、肝细胞癌、肝移植或Child-Turcotte-Pugh评分增加至≥7)。预测因素包括纵向临床、实验室和组织学数据。使用逻辑回归和两种机器学习方法(随机森林和提升法)构建模型,以预测未来12个月的结局。将对照组用作训练数据集(n = 349例临床数据,n = 184例纤维化数据),将干扰素组用于内部验证。使用逻辑回归时,纤维化进展纵向模型的受试者工作特征曲线下面积为0.78(95%置信区间[CI] 0.74 - 0.83),使用随机森林时为0.79(95% CI 0.77 - 0.81),使用提升法时为0.79(95% CI 0.77 - 0.82)。使用逻辑回归时,临床进展纵向模型的受试者工作特征曲线下面积为0.79(95% CI 0.77 - 0.82),使用随机森林时为0.86(95% CI 0.85 - 0.87),使用提升法时为0.84(95% CI 0.82 - 0.86)。纵向模型在两个结局方面均优于基线模型(P < 0.0001)。纵向机器学习模型在两个结局方面的阴性预测值均为94%。
纳入纵向数据的预测模型能够捕捉慢性丙型肝炎的非线性疾病进展,因此优于基线模型。机器学习方法能够捕捉预测因素与结局之间的复杂关系,产生更准确的预测;我们的模型有助于将昂贵的治疗靶向最急需的患者,指导所需临床监测的强度,并为患者提供预后信息。