Soper Braden, Lisicki Michal, Silva Mary, Cadena Jose, Zhu Haonan, Sundaram Shivshankar, Ray Priyadip, Drocco Jeff
Lawrence Livermore National Laboratory, 7000 East Ave, Livermore, CA, 94550, USA.
School of Engineering, University of Guelph, Guelph, ON, N1G 2W1, Canada.
Sci Rep. 2025 Aug 25;15(1):31196. doi: 10.1038/s41598-025-13972-7.
In silico methods for predicting the effects of multi-gene perturbations hold great promise for advancing functional genomics, computational drug discovery, and disease modeling. However, the development of these predictive algorithms for mammalian systems has been hampered by limited datasets and high experimental costs. In this study, we present a Bayesian active learning framework designed to discover pairwise host gene knockdowns that effectively inhibit viral proliferation in an in vitro HIV-1 infection model. Our method leverages a biological knowledge graph as side information and employs a computationally efficient batch diversification approach. We evaluated this framework using a dataset of viral load measurements obtained from multi-day dual-gene depletion experiments, encompassing all possible pairwise knockdowns of over 350 host genes associated with HIV infection. We demonstrate that our framework rapidly identifies the most effective gene knockdown pairs for reducing viral load. Furthermore, we show that incorporating side information enhances performance during the early stages of active learning (low data regime), while our batch diversification strategy significantly boosts performance in later stages (high data regime). This framework is general and can be adapted to explore gene interactions in other contexts, such as synthetic lethality prediction and mapping epistatic effects across quantitative trait loci.
用于预测多基因扰动效应的计算机模拟方法在推进功能基因组学、计算药物发现和疾病建模方面具有巨大潜力。然而,用于哺乳动物系统的这些预测算法的开发受到数据集有限和实验成本高昂的阻碍。在本研究中,我们提出了一种贝叶斯主动学习框架,旨在发现能在体外HIV-1感染模型中有效抑制病毒增殖的成对宿主基因敲低组合。我们的方法利用生物知识图谱作为辅助信息,并采用计算效率高的批量多样化方法。我们使用从多日双基因敲除实验获得的病毒载量测量数据集评估了该框架,该数据集涵盖了与HIV感染相关的350多个宿主基因的所有可能成对敲除组合。我们证明我们的框架能快速识别出降低病毒载量最有效的基因敲低对。此外,我们表明纳入辅助信息可在主动学习的早期阶段(低数据量阶段)提高性能,而我们的批量多样化策略在后期阶段(高数据量阶段)能显著提升性能。该框架具有通用性,可适用于探索其他背景下的基因相互作用,如合成致死预测以及跨数量性状基因座绘制上位性效应。