Nakano Takumi, Takeda Shunichi, Brown J B
Kyoto University Graduate School of Medicine , Department of Molecular Biosciences , Life Science Informatics Research Unit , Konoemachi Yoshida Sakyo , Kyoto 606-8501 , Japan . Email:
Kyoto University Graduate School of Medicine , Department of Radiation Genetics , Konoemachi Yoshida Sakyo , Kyoto 606-8501 , Japan.
RSC Med Chem. 2020 Jul 20;11(9):1075-1087. doi: 10.1039/d0md00110d. eCollection 2020 Sep 1.
The NCI-60 cancer cell line screening panel has provided insights for development of subtype-specific chemical therapies and repurposing. By extracting chemical structure and cytotoxicity patterns, virtual screening potentially complements the availability of high-throughput assay platforms and improves bioactive compound discovery rates by computational prefiltering of candidate compound libraries. Many groups report high prediction performances in computational models of NCI-60 data when using cross-validation or similar techniques, yet prospective therapy development in novel cancers may have little to no such data and further may not have the resources to perform hit identification using large compound libraries. In contrast to bulk screening and analysis, the active learning methodology has demonstrated how to identify compounds for screening in small batches and update computational models iteratively, leading to predictive models with a minimum number of compounds, and importantly clarifying data volumes at which limits in predictive ability are achieved. Here, in replicate per-cell line experiments using 50% of data (∼20 000 compounds) as the external prediction target, predictive limits are reproducibly demonstrated at the stage of systematic selection of 10-30% of the incorporable half. The pattern was consistent across all 60 cell lines. Limits of predictability are found to be correlated to the doubling times of cell lines and the number of cellular response discontinuities (activity cliffs) present per cell line. Organization into chemical scaffolds delineated degrees of predictive challenge. These results provide key insights for strategies in developing new inhibitors in existing cell lines or for future automated therapy selection in personalized oncotherapy.
NCI - 60癌细胞系筛选小组为亚型特异性化学疗法的开发和药物重新利用提供了见解。通过提取化学结构和细胞毒性模式,虚拟筛选有可能补充高通量检测平台的可用性,并通过对候选化合物库进行计算预筛选来提高生物活性化合物的发现率。许多研究小组报告称,在使用交叉验证或类似技术时,NCI - 60数据的计算模型具有很高的预测性能,然而在新型癌症的前瞻性治疗开发中,可能几乎没有此类数据,而且进一步可能没有资源使用大型化合物库来进行活性化合物鉴定。与批量筛选和分析不同,主动学习方法已经展示了如何小批量识别用于筛选的化合物,并迭代更新计算模型,从而得到使用最少数量化合物的预测模型,并且重要的是明确了达到预测能力极限时的数据量。在此,在每个细胞系实验的重复实验中,使用50%的数据(约20000种化合物)作为外部预测目标,在系统选择可纳入的一半的10 - 30%阶段可重复地证明了预测极限。该模式在所有60个细胞系中都是一致的。发现可预测性的极限与细胞系的倍增时间以及每个细胞系中存在的细胞反应不连续性(活性悬崖)的数量相关。按化学支架进行组织划分了预测挑战的程度。这些结果为在现有细胞系中开发新抑制剂的策略或未来个性化肿瘤治疗中的自动治疗选择提供了关键见解。