简单的控制方法超越了最佳深度学习算法，并揭示了基础模型在预测基因扰动方面的有效性。

Simple controls exceed best deep learning algorithms and reveal foundation model effectiveness for predicting genetic perturbations.

作者信息

Wong Daniel R, Hill Abby S, Moccia Rob

机构信息

Pfizer Worldwide Research Development and Medical, Machine Learning and Computational Sciences, Cambridge, MA 02139, United States.

出版信息

Bioinformatics. 2025 Jun 2;41(6). doi: 10.1093/bioinformatics/btaf317.

DOI:10.1093/bioinformatics/btaf317

PMID:40407144

Abstract

MOTIVATION

Modeling genetic perturbations and their effect on the transcriptome is a key area of pharmaceutical research. Due to the complexity of the transcriptome, there has been much excitement and development in deep learning (DL) because of its ability to model complex relationships. In particular, the transformer-based foundation model paradigm emerged as the gold-standard of predicting post-perturbation responses. However, understanding these increasingly complex models and evaluating their practical utility is lacking, along with simple but appropriate benchmarks to compare predictive methods.

RESULTS

Here, we present a simple baseline method that outperforms both state of the art (SOTA) in DL and other proposed simpler neural architectures, setting a necessary benchmark to evaluate in the field of post-perturbation prediction. We also elucidate the utility of foundation models for the task of post-perturbation prediction via generalizable fine-tuning experiments that can be translated to different applications of transformer-based foundation models to tasks of interest. Furthermore, we provide a corrected version of a popular dataset used for benchmarking perturbation prediction models. Our hope is that this work will properly contextualize further development of DL models in the perturbation space with necessary control procedures.

AVAILABILITY AND IMPLEMENTATION

All source code is available at: https://github.com/pfizer-opensource/perturb_seq. The DOI is 10.5281/zenodo.15352937.

摘要

动机

对基因扰动及其对转录组的影响进行建模是药物研究的一个关键领域。由于转录组的复杂性，深度学习（DL）因其能够对复杂关系进行建模而备受关注并得到了很大发展。特别是，基于Transformer的基础模型范式成为预测扰动后反应的黄金标准。然而，目前缺乏对这些日益复杂的模型的理解以及对其实用性的评估，同时也缺乏简单而合适的基准来比较预测方法。

结果

在此，我们提出了一种简单的基线方法，该方法在深度学习的现有技术水平（SOTA）以及其他提出的更简单的神经架构方面均表现出色，为扰动后预测领域的评估设定了必要的基准。我们还通过可推广的微调实验阐明了基础模型在扰动后预测任务中的实用性，这些实验可以转化为基于Transformer的基础模型在感兴趣任务中的不同应用。此外，我们提供了一个用于基准测试扰动预测模型的流行数据集的校正版本。我们希望这项工作能够通过必要的控制程序，为扰动空间中深度学习模型的进一步发展提供适当的背景。