Merkley Eric D, Sego Landon H, Lin Andy, Leiser Owen P, Kaiser Brooke L Deatherage, Adkins Joshua N, Keim Paul S, Wagner David M, Kreuzer Helen W
Chemical and Biological Signature Sciences, Pacific Northwest National Laboratory, Richland, Washington, United States of America.
Applied Statistics and Computational Modeling, Pacific Northwest National Laboratory, Richland, Washington, United States of America.
PLoS One. 2017 Aug 30;12(8):e0183478. doi: 10.1371/journal.pone.0183478. eCollection 2017.
The rapid pace of bacterial evolution enables organisms to adapt to the laboratory environment with repeated passage and thus diverge from naturally-occurring environmental ("wild") strains. Distinguishing wild and laboratory strains is clearly important for biodefense and bioforensics; however, DNA sequence data alone has thus far not provided a clear signature, perhaps due to lack of understanding of how diverse genome changes lead to convergent phenotypes, difficulty in detecting certain types of mutations, or perhaps because some adaptive modifications are epigenetic. Monitoring protein abundance, a molecular measure of phenotype, can overcome some of these difficulties. We have assembled a collection of Yersinia pestis proteomics datasets from our own published and unpublished work, and from a proteomics data archive, and demonstrated that protein abundance data can clearly distinguish laboratory-adapted from wild. We developed a lasso logistic regression classifier that uses binary (presence/absence) or quantitative protein abundance measures to predict whether a sample is laboratory-adapted or wild that proved to be ~98% accurate, as judged by replicated 10-fold cross-validation. Protein features selected by the classifier accord well with our previous study of laboratory adaptation in Y. pestis. The input data was derived from a variety of unrelated experiments and contained significant confounding variables. We show that the classifier is robust with respect to these variables. The methodology is able to discover signatures for laboratory facility and culture medium that are largely independent of the signature of laboratory adaptation. Going beyond our previous laboratory evolution study, this work suggests that proteomic differences between laboratory-adapted and wild Y. pestis are general, potentially pointing to a process that could apply to other species as well. Additionally, we show that proteomics datasets (even archived data collected for different purposes) contain the information necessary to distinguish wild and laboratory samples. This work has clear applications in biomarker detection as well as biodefense.
细菌进化的快速步伐使生物体能够通过反复传代适应实验室环境,从而与自然存在的环境(“野生”)菌株产生差异。区分野生菌株和实验室菌株对于生物防御和生物法医学显然很重要;然而,迄今为止,仅DNA序列数据尚未提供明确的特征,这可能是由于对多样的基因组变化如何导致趋同表型缺乏了解、难以检测某些类型的突变,或者可能是因为一些适应性修饰是表观遗传的。监测蛋白质丰度(一种表型的分子指标)可以克服其中一些困难。我们从自己已发表和未发表的工作以及一个蛋白质组学数据存档中收集了鼠疫耶尔森菌蛋白质组学数据集,并证明蛋白质丰度数据可以清楚地区分适应实验室环境的菌株和野生菌株。我们开发了一种套索逻辑回归分类器,该分类器使用二元(存在/不存在)或定量蛋白质丰度测量来预测一个样本是适应实验室环境的还是野生的,经重复10倍交叉验证判断,其准确率约为98%。分类器选择的蛋白质特征与我们之前对鼠疫耶尔森菌实验室适应性的研究结果非常吻合。输入数据来自各种不相关的实验,并且包含显著的混杂变量。我们表明该分类器对这些变量具有鲁棒性。该方法能够发现与实验室适应性特征基本无关的实验室设施和培养基的特征。超越我们之前的实验室进化研究,这项工作表明适应实验室环境的鼠疫耶尔森菌和野生菌株之间的蛋白质组差异是普遍存在的,这可能指向一个也适用于其他物种的过程。此外,我们表明蛋白质组学数据集(甚至是为不同目的收集的存档数据)包含区分野生和实验室样本所需的信息。这项工作在生物标志物检测以及生物防御方面有明确的应用。