Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio 43210, United States.
Biomedical Informatics, The Ohio State University, Columbus, Ohio 43210, United States.
J Chem Inf Model. 2024 May 27;64(10):4071-4088. doi: 10.1021/acs.jcim.3c01060. Epub 2024 May 13.
Personalized cancer treatment requires a thorough understanding of complex interactions between drugs and cancer cell lines in varying genetic and molecular contexts. To address this, high-throughput screening has been used to generate large-scale drug response data, facilitating data-driven computational models. Such models can capture complex drug-cell line interactions across various contexts in a fully data-driven manner. However, accurately prioritizing the most effective drugs for each cell line still remains a significant challenge. To address this, we developed multiple neural ranking approaches that leverage large-scale drug response data across multiple cell lines from diverse cancer types. Unlike existing approaches that primarily utilize regression and classification techniques for drug response prediction, we formulated the objective of drug selection and prioritization as a drug ranking problem. In this work, we proposed multiple pairwise and listwise neural ranking methods that learn latent representations of drugs and cell lines and then use those representations to score drugs in each cell line via a learnable scoring function. Specifically, we developed neural pairwise and listwise ranking methods, Pair-PushC and List-One on top of the existing methods, pLETORg and ListNet, respectively. Additionally, we proposed a novel listwise ranking method, List-All, that focuses on all the effective drugs instead of the top effective drug, unlike List-One. We also provide an exhaustive empirical evaluation with state-of-the-art regression and ranking baselines on large-scale data sets across multiple experimental settings. Our results demonstrate that our proposed ranking methods mostly outperform the best baselines with significant improvements of as much as 25.6% in terms of selecting truly effective drugs within the top 20 predicted drugs (i.e., hit@20) across 50% test cell lines. Furthermore, our analyses suggest that the learned latent spaces from our proposed methods demonstrate informative clustering structures and capture relevant underlying biological features. Moreover, our comprehensive evaluation provides a thorough and objective comparison of the performance of different methods (including our proposed ones).
个性化癌症治疗需要深入了解药物与不同遗传和分子背景下的癌细胞系之间的复杂相互作用。为此,高通量筛选已被用于生成大规模的药物反应数据,从而促进了数据驱动的计算模型。这些模型可以以完全数据驱动的方式捕捉各种背景下复杂的药物-细胞系相互作用。然而,准确地为每个细胞系确定最有效的药物仍然是一个重大挑战。为了解决这个问题,我们开发了多种基于神经网络的排名方法,利用来自不同癌症类型的多个细胞系的大规模药物反应数据。与主要利用回归和分类技术进行药物反应预测的现有方法不同,我们将药物选择和优先级确定的目标制定为药物排名问题。在这项工作中,我们提出了多种基于神经网络的成对和列表排名方法,这些方法学习药物和细胞系的潜在表示,然后使用这些表示通过可学习的评分函数对每个细胞系中的药物进行评分。具体来说,我们在现有的方法 pLETORg 和 ListNet 之上分别开发了基于神经网络的成对和列表排名方法,Pair-PushC 和 List-One。此外,我们提出了一种新颖的列表排名方法 List-All,它专注于所有有效的药物,而不是像 List-One 那样只关注前有效的药物。我们还在多个实验设置的大规模数据集上提供了与最先进的回归和排名基线的详尽实证评估。我们的结果表明,与最佳基线相比,我们提出的排名方法大多表现出色,在 50%的测试细胞系中,前 20 个预测药物(即 hit@20)中选择真正有效的药物的能力提高了 25.6%。此外,我们的分析表明,我们提出的方法从学习到的潜在空间中展示了信息丰富的聚类结构,并捕获了相关的潜在生物学特征。此外,我们的综合评估提供了不同方法(包括我们提出的方法)性能的全面和客观比较。