Kamath Vidya P, Torres-Roca Javier F, Eschrich Steven A
Department of Biostatistics & Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, USA.
Department of Radiation Oncology, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, USA.
Int J Genomics. 2017;2017:6576840. doi: 10.1155/2017/6576840. Epub 2017 Feb 8.
The use of gene expression-based classifiers has resulted in a number of promising potential signatures of patient diagnosis, prognosis, and response to therapy. However, these approaches have also created difficulties in trying to use gene expression alone to predict a complex trait. A practical approach to this problem is to integrate existing biological knowledge with gene expression to build a composite predictor. We studied the problem of predicting radiation sensitivity within human cancer cell lines from gene expression. First, we present evidence for the need to integrate known biological conditions (tissue of origin, RAS, and p53 mutational status) into a gene expression prediction problem involving radiation sensitivity. Next, we demonstrate using linear regression, a technique for incorporating this knowledge. The resulting correlations between gene expression and radiation sensitivity improved through the use of this technique (best-fit adjusted increased from 0.3 to 0.84). Overfitting of data was examined through the use of simulation. The results reinforce the concept that radiation sensitivity is not driven solely by gene expression, but rather by a combination of distinct parameters. We show that accounting for biological heterogeneity significantly improves the ability of the model to identify genes that are associated with radiosensitivity.
基于基因表达的分类器的使用已经产生了许多关于患者诊断、预后和治疗反应的有前景的潜在特征。然而,这些方法在试图仅使用基因表达来预测复杂性状时也带来了困难。解决这个问题的一个实际方法是将现有的生物学知识与基因表达相结合,构建一个复合预测器。我们研究了从基因表达预测人类癌细胞系辐射敏感性的问题。首先,我们提供了将已知生物学条件(组织来源、RAS和p53突变状态)整合到涉及辐射敏感性的基因表达预测问题中的必要性的证据。接下来,我们展示了使用线性回归这种纳入该知识的技术。通过使用该技术,基因表达与辐射敏感性之间的相关性得到了改善(最佳拟合调整 从0.3提高到0.84)。通过模拟检查了数据的过度拟合情况。结果强化了这样一个概念,即辐射敏感性并非仅由基因表达驱动,而是由不同参数的组合驱动。我们表明,考虑生物学异质性显著提高了模型识别与放射敏感性相关基因的能力。