Department of Pharmacology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America.
Joint Department of Biomedical Engineering, University of North Carolina at Chapel Hill and North Carolina State University, Chapel Hill, North Carolina, United States of America.
PLoS Comput Biol. 2023 Feb 21;19(2):e1010888. doi: 10.1371/journal.pcbi.1010888. eCollection 2023 Feb.
Protein kinases play a vital role in a wide range of cellular processes, and compounds that inhibit kinase activity emerging as a primary focus for targeted therapy development, especially in cancer. Consequently, efforts to characterize the behavior of kinases in response to inhibitor treatment, as well as downstream cellular responses, have been performed at increasingly large scales. Previous work with smaller datasets have used baseline profiling of cell lines and limited kinome profiling data to attempt to predict small molecule effects on cell viability, but these efforts did not use multi-dose kinase profiles and achieved low accuracy with very limited external validation. This work focuses on two large-scale primary data types, kinase inhibitor profiles and gene expression, to predict the results of cell viability screening. We describe the process by which we combined these data sets, examined their properties in relation to cell viability and finally developed a set of computational models that achieve a reasonably high prediction accuracy (R2 of 0.78 and RMSE of 0.154). Using these models, we identified a set of kinases, several of which are understudied, that are strongly influential in the cell viability prediction models. In addition, we also tested to see if a wider range of multiomics data sets could improve the model results and found that proteomic kinase inhibitor profiles were the single most informative data type. Finally, we validated a small subset of the model predictions in several triple-negative and HER2 positive breast cancer cell lines demonstrating that the model performs well with compounds and cell lines that were not included in the training data set. Overall, this result demonstrates that generic knowledge of the kinome is predictive of very specific cell phenotypes, and has the potential to be integrated into targeted therapy development pipelines.
蛋白激酶在广泛的细胞过程中起着至关重要的作用,抑制激酶活性的化合物已成为靶向治疗开发的主要焦点,尤其是在癌症领域。因此,人们已经在越来越大的规模上努力描述激酶在抑制剂治疗以及下游细胞反应中的行为。以前使用较小数据集的研究使用细胞系的基线分析和有限的激酶组分析数据来尝试预测小分子对细胞活力的影响,但这些研究并未使用多剂量激酶谱,并且仅通过非常有限的外部验证实现了低准确性。这项工作侧重于两种大规模的原始数据类型,即激酶抑制剂谱和基因表达,以预测细胞活力筛选的结果。我们描述了将这些数据集组合的过程,研究了它们与细胞活力的关系,并最终开发了一组计算模型,这些模型实现了相当高的预测准确性(R2 为 0.78,RMSE 为 0.154)。使用这些模型,我们确定了一组激酶,其中一些激酶研究较少,它们对细胞活力预测模型具有很强的影响力。此外,我们还测试了更广泛的多组学数据集是否可以改善模型结果,发现蛋白质组学激酶抑制剂谱是最具信息量的数据类型。最后,我们在几种三阴性和 HER2 阳性乳腺癌细胞系中验证了模型预测的一小部分,证明该模型在训练数据集未包含的化合物和细胞系中表现良好。总体而言,该结果表明,对激酶组的一般了解可预测非常具体的细胞表型,并且有可能被整合到靶向治疗开发管道中。