Chemical Biology & Therapeutics Science Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
Operational Research and Financial Engineering, Princeton University, Princeton, NJ, USA.
Bioinformatics. 2018 Oct 1;34(19):3332-3339. doi: 10.1093/bioinformatics/bty199.
In recent years there have been several efforts to generate sensitivity profiles of collections of genomically characterized cell lines to panels of candidate therapeutic compounds. These data provide the basis for the development of in silico models of sensitivity based on cellular, genetic, or expression biomarkers of cancer cells. However, a remaining challenge is an efficient way to identify accurate sets of biomarkers to validate. To address this challenge, we developed methodology using gene-expression profiles of human cancer cell lines to predict the responses of these cell lines to a panel of compounds.
We developed an iterative weighting scheme which, when applied to elastic net, a regularized regression method, significantly improves the overall accuracy of predictions, particularly in the highly sensitive response region. In addition to application of these methods to actual chemical sensitivity data, we investigated the effects of sample size, number of features, model sparsity, signal-to-noise ratio, and feature correlation on predictive performance using a simulation framework, particularly for situations where the number of covariates is much larger than sample size. While our method aims to be useful in therapeutic discovery and understanding of the basic mechanisms of action of drugs and their targets, it is generally applicable in any domain where predictions of extreme responses are of highest importance.
The iterative and other weighting algorithms were implemented in R. The code is available at https://github.com/kiwtir/RWEN. The CTRP data are available at ftp://caftpd.nci.nih.gov/pub/OCG-DCC/CTD2/Broad/CTRPv2.1_2016_pub_NatChemBiol_12_109/ and the Sanger data at ftp://ftp.sanger.ac.uk/pub/project/cancerrxgene/releases/release-6.0/.
Supplementary data are available at Bioinformatics online.
近年来,人们已经做出了一些努力,以生成对候选治疗化合物进行分组的基因组特征化细胞系的敏感性概况。这些数据为基于癌细胞的细胞、遗传或表达生物标志物的敏感性的计算模型的开发提供了基础。然而,仍然存在一个挑战,即需要一种有效的方法来识别准确的生物标志物集进行验证。为了解决这个挑战,我们开发了一种使用人类癌细胞系的基因表达谱来预测这些细胞系对化合物的反应的方法。
我们开发了一种迭代加权方案,当应用于弹性网络(一种正则化回归方法)时,显著提高了预测的整体准确性,特别是在高度敏感的反应区域。除了将这些方法应用于实际的化学敏感性数据之外,我们还使用模拟框架研究了样本量、特征数量、模型稀疏性、信噪比和特征相关性对预测性能的影响,特别是在协变量数量远大于样本量的情况下。虽然我们的方法旨在对治疗发现和药物及其靶标的基本作用机制的理解有用,但它通常适用于任何对极端反应的预测最重要的领域。
迭代和其他加权算法在 R 中实现。代码可在 https://github.com/kiwtir/RWEN 上获得。CTRP 数据可在 ftp://caftpd.nci.nih.gov/pub/OCG-DCC/CTD2/Broad/CTRPv2.1_2016_pub_NatChemBiol_12_109/ 上获得,Sanger 数据可在 ftp://ftp.sanger.ac.uk/pub/project/cancerrxgene/releases/release-6.0/ 上获得。
补充数据可在生物信息学在线获得。