Vasanthakumari Priyanka, Zhu Yitan, Brettin Thomas, Partin Alexander, Shukla Maulik, Xia Fangfang, Narykov Oleksandr, Weil Michael Ryan, Stevens Rick L
Division of Data Science and Learning, Argonne National Laboratory, Lemont, IL 60439, USA.
Computing, Environment and Life Sciences, Argonne National Laboratory, Lemont, IL 60439, USA.
Cancers (Basel). 2024 Jan 26;16(3):530. doi: 10.3390/cancers16030530.
It is well-known that cancers of the same histology type can respond differently to a treatment. Thus, computational drug response prediction is of paramount importance for both preclinical drug screening studies and clinical treatment design. To build drug response prediction models, treatment response data need to be generated through screening experiments and used as input to train the prediction models. In this study, we investigate various active learning strategies of selecting experiments to generate response data for the purposes of (1) improving the performance of drug response prediction models built on the data and (2) identifying effective treatments. Here, we focus on constructing drug-specific response prediction models for cancer cell lines. Various approaches have been designed and applied to select cell lines for screening, including a random, greedy, uncertainty, diversity, combination of greedy and uncertainty, sampling-based hybrid, and iteration-based hybrid approach. All of these approaches are evaluated and compared using two criteria: (1) the number of identified hits that are selected experiments validated to be responsive, and (2) the performance of the response prediction model trained on the data of selected experiments. The analysis was conducted for 57 drugs and the results show a significant improvement on identifying hits using active learning approaches compared with the random and greedy sampling method. Active learning approaches also show an improvement on response prediction performance for some of the drugs and analysis runs compared with the greedy sampling method.
众所周知,相同组织学类型的癌症对治疗的反应可能不同。因此,计算药物反应预测对于临床前药物筛选研究和临床治疗设计都至关重要。为了建立药物反应预测模型,需要通过筛选实验生成治疗反应数据,并将其用作训练预测模型的输入。在本研究中,我们研究了各种主动学习策略,用于选择实验以生成反应数据,目的是:(1)提高基于这些数据构建的药物反应预测模型的性能;(2)识别有效的治疗方法。在这里,我们专注于为癌细胞系构建特定药物的反应预测模型。已经设计并应用了各种方法来选择用于筛选的细胞系,包括随机、贪婪、不确定性、多样性、贪婪与不确定性相结合、基于采样的混合以及基于迭代的混合方法。所有这些方法都使用两个标准进行评估和比较:(1)被选定实验验证为有反应的已识别命中数;(2)基于选定实验数据训练的反应预测模型的性能。对57种药物进行了分析,结果表明,与随机和贪婪采样方法相比,使用主动学习方法在识别命中方面有显著改进。与贪婪采样方法相比,主动学习方法在某些药物和分析运行的反应预测性能方面也有所提高。