Interdisciplinary Computing and Complex BioSystems (ICOS) Group, School of Computing, Newcastle University, Newcastle upon Tyne NE1 7RU, UK.
Centre for Bacterial Cell Biology, Biosciences Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne NE1 7RU, UK.
Sensors (Basel). 2021 Apr 1;21(7):2436. doi: 10.3390/s21072436.
A goal of the biotechnology industry is to be able to recognise detrimental cellular states that may lead to suboptimal or anomalous growth in a bacterial population. Our current knowledge of how different environmental treatments modulate gene regulation and bring about physiology adaptations is limited, and hence it is difficult to determine the mechanisms that lead to their effects. Patterns of gene expression, revealed using technologies such as microarrays or RNA-seq, can provide useful biomarkers of different gene regulatory states indicative of a bacterium's physiological status. It is desirable to have only a few key genes as the biomarkers to reduce the costs of determining the transcriptional state by opening the way for methods such as quantitative RT-PCR and amplicon panels. In this paper, we used unsupervised machine learning to construct a transcriptional landscape model from condition-dependent transcriptome data, from which we have identified 10 clusters of samples with differentiated gene expression profiles and linked to different cellular growth states. Using an iterative feature elimination strategy, we identified a minimal panel of 10 biomarker genes that achieved 100% cross-validation accuracy in predicting the cluster assignment. Moreover, we designed and evaluated a variety of data processing strategies to ensure our methods were able to generate meaningful transcriptional landscape models, capturing relevant biological processes. Overall, the computational strategies introduced in this study facilitate the identification of a detailed set of relevant cellular growth states, and how to sense them using a reduced biomarker panel.
生物技术行业的目标是能够识别可能导致细菌群体生长不佳或异常的有害细胞状态。我们目前对于不同环境处理如何调节基因调控并带来生理适应的了解有限,因此很难确定导致其效果的机制。使用微阵列或 RNA-seq 等技术揭示的基因表达模式,可以为不同基因调控状态提供有用的生物标志物,这些标志物表明了细菌的生理状态。最好只有少数几个关键基因作为生物标志物,通过为定量 RT-PCR 和扩增子面板等方法开辟道路,降低确定转录状态的成本。在本文中,我们使用无监督机器学习从条件相关转录组数据构建转录景观模型,从中我们确定了 10 个具有不同基因表达谱的样本聚类,并与不同的细胞生长状态相关联。使用迭代特征消除策略,我们确定了一个由 10 个生物标志物基因组成的最小面板,在预测聚类分配方面实现了 100%的交叉验证准确性。此外,我们设计并评估了多种数据处理策略,以确保我们的方法能够生成有意义的转录景观模型,捕获相关的生物学过程。总体而言,本研究中引入的计算策略有助于确定一组详细的相关细胞生长状态,以及如何使用减少的生物标志物面板来感知这些状态。