Data Science, Netrias, LLC, Annapolis, MD 21409, USA.
Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
Bioinformatics. 2022 Jan 3;38(2):404-409. doi: 10.1093/bioinformatics/btab676.
Applications in synthetic and systems biology can benefit from measuring whole-cell response to biochemical perturbations. Execution of experiments to cover all possible combinations of perturbations is infeasible. In this paper, we present the host response model (HRM), a machine learning approach that maps response of single perturbations to transcriptional response of the combination of perturbations.
The HRM combines high-throughput sequencing with machine learning to infer links between experimental context, prior knowledge of cell regulatory networks, and RNASeq data to predict a gene's dysregulation. We find that the HRM can predict the directionality of dysregulation to a combination of inducers with an accuracy of >90% using data from single inducers. We further find that the use of prior, known cell regulatory networks doubles the predictive performance of the HRM (an R2 from 0.3 to 0.65). The model was validated in two organisms, Escherichia coli and Bacillus subtilis, using new experiments conducted after training. Finally, while the HRM is trained with gene expression data, the direct prediction of differential expression makes it possible to also conduct enrichment analyses using its predictions. We show that the HRM can accurately classify >95% of the pathway regulations. The HRM reduces the number of RNASeq experiments needed as responses can be tested in silico prior to the experiment.
The HRM software and tutorial are available at https://github.com/sd2e/CDM and the configurable differential expression analysis tools and tutorials are available at https://github.com/SD2E/omics_tools.
Supplementary data are available at Bioinformatics online.
合成和系统生物学中的应用可以受益于测量整个细胞对生化扰动的反应。执行实验以涵盖所有可能的扰动组合是不可行的。在本文中,我们提出了宿主反应模型(HRM),这是一种机器学习方法,它将单个扰动的反应映射到扰动组合的转录反应。
HRM 将高通量测序与机器学习相结合,以推断实验背景、细胞调控网络的先验知识和 RNASeq 数据之间的联系,从而预测基因的失调。我们发现,HRM 可以使用来自单个诱导剂的数据,以 >90%的准确率预测组合诱导剂的失调方向。我们进一步发现,使用先验的、已知的细胞调控网络可以使 HRM 的预测性能提高一倍(从 0.3 到 0.65 的 R2)。该模型在两种生物,大肠杆菌和枯草芽孢杆菌中进行了验证,使用了在训练后进行的新实验。最后,虽然 HRM 是使用基因表达数据进行训练的,但通过其预测进行差异表达的直接预测使其能够使用其预测进行富集分析。我们表明,HRM 可以准确地对 >95%的途径调控进行分类。HRM 减少了所需的 RNASeq 实验数量,因为可以在实验之前在计算机上测试反应。
HRM 软件和教程可在 https://github.com/sd2e/CDM 上获得,可配置的差异表达分析工具和教程可在 https://github.com/SD2E/omics_tools 上获得。
补充数据可在生物信息学在线获得。