School of Information Science and Technology, ShanghaiTech University, Shanghai, China.
Lingang Laboratory, Shanghai, China.
Nat Commun. 2024 Oct 20;15(1):9058. doi: 10.1038/s41467-024-52900-7.
Synthetic lethality (SL) is a gold mine of anticancer drug targets, exposing cancer-specific dependencies of cellular survival. To complement resource-intensive experimental screening, many machine learning methods for SL prediction have emerged recently. However, a comprehensive benchmarking is lacking. This study systematically benchmarks 12 recent machine learning methods for SL prediction, assessing their performance across diverse data splitting scenarios, negative sample ratios, and negative sampling techniques, on both classification and ranking tasks. We observe that all the methods can perform significantly better by improving data quality, e.g., excluding computationally derived SLs from training and sampling negative labels based on gene expression. Among the methods, SLMGAE performs the best. Furthermore, the methods have limitations in realistic scenarios such as cold-start independent tests and context-specific SLs. These results, together with source code and datasets made freely available, provide guidance for selecting suitable methods and developing more powerful techniques for SL virtual screening.
合成致死性 (SL) 是抗癌药物靶点的金矿,揭示了细胞存活的癌症特异性依赖性。为了补充资源密集型的实验筛选,最近出现了许多用于 SL 预测的机器学习方法。然而,缺乏全面的基准测试。本研究系统地对 12 种最近用于 SL 预测的机器学习方法进行基准测试,评估它们在不同的数据分割场景、负样本比例和负样本采样技术上的分类和排序任务的性能。我们观察到,通过提高数据质量,所有方法都可以显著提高性能,例如从训练中排除计算得出的 SL,并根据基因表达对负标签进行采样。在这些方法中,SLMGAE 的性能最好。此外,这些方法在现实场景中存在局限性,例如冷启动独立测试和特定于上下文的 SL。这些结果,以及免费提供的代码和数据集,为选择合适的方法和开发更强大的 SL 虚拟筛选技术提供了指导。