Kim Yutae, Lee Doheon
Dept. of Bio and Brain Engineering, KAIST, 291, Daehak-ro, Yuseong-gu, Daejeon, 34141, Korea.
Comput Struct Biotechnol J. 2025 Jun 11;27:2566-2573. doi: 10.1016/j.csbj.2025.06.018. eCollection 2025.
Human cell line models are essential for understanding diseases and cellular functions. They are particularly emphasized in drug discovery because these models enable the systematic screening of chemical compounds and their effects. However, the heterogeneity in measurement techniques and the fragmented characterization of cell lines in chemical screening and omics data pose significant challenges to their optimal utilization. To address this, we introduce an unsupervised deep learning model based on contrastive learning that integrates heterogeneous drug response screening data into a unified cell line embedding. Utilizing the resulting embedding enhances the performance of drug-cell line-related downstream machine learning tasks to varying degrees. We used drug response data from 1,136 cell lines to train an embedding model and subsequently embedded 537 additional cell lines that were not included in the training, thereby completing the full set of 1,673 cancer cell lines from the Cancer Dependency Map (DepMap) that have corresponding gene expression data. We demonstrate that incorporating the embedding into various drug response-related tasks improves machine learning performance, including predicting drug synergy and drug response in cell lines. Furthermore, we applied SHapley additive explanations (SHAP) to identify genes with significant contributions to the embedding and found that these genes are strongly associated with drug resistance of various cancers and multiple types of cancer.
人类细胞系模型对于理解疾病和细胞功能至关重要。在药物发现中,它们尤其受到重视,因为这些模型能够对化学化合物及其效果进行系统筛选。然而,测量技术的异质性以及化学筛选和组学数据中细胞系表征的碎片化,对其优化利用构成了重大挑战。为解决这一问题,我们引入了一种基于对比学习的无监督深度学习模型,该模型将异质的药物反应筛选数据整合到统一的细胞系嵌入中。利用所得的嵌入在不同程度上提高了与药物 - 细胞系相关的下游机器学习任务的性能。我们使用来自1136个细胞系的药物反应数据训练一个嵌入模型,随后对未包含在训练中的另外537个细胞系进行嵌入,从而完成了来自癌症依赖性图谱(DepMap)的1673个癌细胞系的完整集合,这些细胞系具有相应的基因表达数据。我们证明,将该嵌入纳入各种与药物反应相关的任务中可提高机器学习性能,包括预测细胞系中的药物协同作用和药物反应。此外,我们应用夏普利加性解释(SHAP)来识别对该嵌入有重大贡献的基因,发现这些基因与各种癌症和多种癌症类型的耐药性密切相关。