Department of Pediatrics and Human Development, Michigan State University, Grand Rapids, MI 49503, USA.
Department of Pharmacology and Toxicology, Michigan State University, Grand Rapids, MI 49503, USA.
Genomics Proteomics Bioinformatics. 2024 Jul 3;22(2). doi: 10.1093/gpbjnl/qzad008.
Gene expression profiling of new or modified cell lines becomes routine today; however, obtaining comprehensive molecular characterization and cellular responses for a variety of cell lines, including those derived from underrepresented groups, is not trivial when resources are minimal. Using gene expression to predict other measurements has been actively explored; however, systematic investigation of its predictive power in various measurements has not been well studied. Here, we evaluated commonly used machine learning methods and presented TransCell, a two-step deep transfer learning framework that utilized the knowledge derived from pan-cancer tumor samples to predict molecular features and responses. Among these models, TransCell had the best performance in predicting metabolite, gene effect score (or genetic dependency), and drug sensitivity, and had comparable performance in predicting mutation, copy number variation, and protein expression. Notably, TransCell improved the performance by over 50% in drug sensitivity prediction and achieved a correlation of 0.7 in gene effect score prediction. Furthermore, predicted drug sensitivities revealed potential repurposing candidates for new 100 pediatric cancer cell lines, and predicted gene effect scores reflected BRAF resistance in melanoma cell lines. Together, we investigated the predictive power of gene expression in six molecular measurement types and developed a web portal (http://apps.octad.org/transcell/) that enables the prediction of 352,000 genomic and cellular response features solely from gene expression profiles.
如今,新细胞系或经修饰细胞系的基因表达谱分析已成为常规操作;然而,当资源有限时,要全面了解各种细胞系(包括代表性不足的细胞系)的分子特征和细胞反应并非易事。人们一直在积极探索利用基因表达来预测其他测量值的方法;然而,系统地研究其在各种测量值中的预测能力尚未得到充分研究。在这里,我们评估了常用的机器学习方法,并提出了 TransCell,这是一个两步深度迁移学习框架,利用来自泛癌症肿瘤样本的知识来预测分子特征和反应。在这些模型中,TransCell 在预测代谢物、基因效应评分(或遗传依赖性)和药物敏感性方面表现最佳,在预测突变、拷贝数变异和蛋白质表达方面具有相当的性能。值得注意的是,TransCell 在药物敏感性预测方面的性能提高了 50%以上,在基因效应评分预测方面的相关性达到了 0.7。此外,预测的药物敏感性揭示了新的 100 种儿科癌细胞系的潜在再利用候选药物,预测的基因效应评分反映了黑色素瘤细胞系中的 BRAF 耐药性。总之,我们研究了基因表达在六种分子测量类型中的预测能力,并开发了一个网络门户(http://apps.octad.org/transcell/),仅从基因表达谱就可以预测 352,000 种基因组和细胞反应特征。