Ma Shuangge, Huang Jian
Department of Epidemiology and Public Health, Yale University, New Haven, CT, USA.
Cancer Inform. 2007 Oct 15;3:371-8.
Clinical covariates such as age, gender, tumor grade, and smoking history have been extensively used in prediction of disease occurrence and progression. On the other hand, genomic biomarkers selected from microarray measurements may provide an alternative, satisfactory way of disease prediction. Recent studies show that better prediction can be achieved by using both clinical and genomic biomarkers. However, due to different characteristics of clinical and genomic measurements, combining those covariates in disease prediction is very challenging. We propose a new regularization method, Covariate-Adjusted Threshold Gradient Directed Regularization (Cov-TGDR), for combining different type of covariates in disease prediction. The proposed approach is capable of simultaneous biomarker selection and predictive model building. It allows different degrees of regularization for different type of covariates. We consider biomedical studies with binary outcomes and right censored survival outcomes as examples. Logistic model and Cox model are assumed, respectively. Analysis of the Breast Cancer data and the Follicular lymphoma data show that the proposed approach can have better prediction performance than using clinical or genomic covariates alone.
诸如年龄、性别、肿瘤分级和吸烟史等临床协变量已被广泛用于疾病发生和进展的预测。另一方面,从微阵列测量中选择的基因组生物标志物可能提供一种替代的、令人满意的疾病预测方法。最近的研究表明,同时使用临床和基因组生物标志物可以实现更好的预测。然而,由于临床和基因组测量的不同特性,在疾病预测中结合这些协变量极具挑战性。我们提出了一种新的正则化方法,即协变量调整阈值梯度定向正则化(Cov-TGDR),用于在疾病预测中结合不同类型的协变量。所提出的方法能够同时进行生物标志物选择和预测模型构建。它允许对不同类型的协变量进行不同程度的正则化。我们以具有二元结局和右删失生存结局的生物医学研究为例。分别假设使用逻辑模型和Cox模型。对乳腺癌数据和滤泡性淋巴瘤数据的分析表明,所提出的方法比单独使用临床或基因组协变量具有更好的预测性能。