Duncan Rebecca P, Lewin Gina R, Cornforth Daniel M, Diggle Frances L, Kapur Ananya, Moustafa Dina A, Hilliam Yasmin, Bomberger Jennifer M, Whiteley Marvin, Goldberg Joanna B
Division of Pulmonary, Asthma, Cystic Fibrosis, and Sleep, Department of Pediatrics, Emory University School of Medicine, Atlanta, Georgia, USA.
Emory-Children's Cystic Fibrosis Center, Atlanta, Georgia, USA.
Microbiol Spectr. 2025 Jan 7;13(1):e0151324. doi: 10.1128/spectrum.01513-24. Epub 2024 Dec 3.
Reproducibility is a fundamental expectation in science and enables investigators to have confidence in their research findings and the ability to compare data from disparate sources, but evaluating reproducibility can be elusive. For example, generating RNA sequencing (RNA-seq) data includes multiple steps where variance can be introduced. Thus, it is unclear if RNA-seq data from different sources can be validly compared. While most studies on RNA-seq reproducibility focus on eukaryotes, we evaluate bias in bacteria using gene expression data from five laboratory models of cystic fibrosis. We leverage a large data set that includes samples prepared in three different laboratories and paired data sets where the same sample was sequenced using at least two different sequencing pipelines. We report here that expression data are highly reproducible across laboratories. In addition, while samples sequenced with different sequencing pipelines showed significantly more variance in expression profiles than between labs, gene expression was still highly reproducible between sequencing pipelines. Further investigation of expression differences between two sequencing pipelines revealed that library preparation methods were the largest source of error, though analyses to identify the source of this variance were inconclusive. Consistent with the reproducibility of expression between sequencing pipelines, we found that different pipelines detected over 80% of the same differentially expressed genes with large expression differences between conditions. Thus, bacterial RNA-seq data from different sources can be validly compared, facilitating the ability to advance understanding of bacterial behavior and physiology using the wide array of publicly available RNA-seq data sets.IMPORTANCERNA sequencing (RNA-seq) has revolutionized biology, but many steps in RNA-seq workflows can introduce variance, potentially compromising reproducibility. While reproducibility in RNA-seq has been thoroughly investigated in eukaryotes, less is known about pipelines and workflows that introduce variance and biases in bacterial RNA-seq data. By leveraging transcriptomes in cystic fibrosis models from different laboratories and sequenced with different sequencing pipelines, we directly assess sources of bacterial RNA-seq variance. RNA-seq data were highly reproducible, with the largest variance due to sequencing pipelines, specifically library preparation. Different sequencing pipelines detected overlapping differentially expressed genes, especially those with large expression differences between conditions. This study confirms that different approaches to preparing and sequencing bacterial RNA libraries capture comparable transcriptional profiles, supporting investigators' ability to leverage diverse RNA-seq data sets to advance their science.
可重复性是科学研究的一项基本要求,它能让研究人员对自己的研究结果充满信心,并能够比较来自不同来源的数据,但评估可重复性可能并非易事。例如,生成RNA测序(RNA-seq)数据包含多个可能引入差异的步骤。因此,尚不清楚来自不同来源的RNA-seq数据是否能够有效比较。虽然大多数关于RNA-seq可重复性的研究都集中在真核生物上,但我们利用来自五个囊性纤维化实验室模型的基因表达数据,评估了细菌中的偏差。我们利用了一个大型数据集,其中包括在三个不同实验室制备的样本,以及使用至少两种不同测序流程对同一样本进行测序的配对数据集。我们在此报告,表达数据在不同实验室之间具有高度可重复性。此外,虽然使用不同测序流程测序的样本在表达谱上的差异明显大于不同实验室之间的差异,但基因表达在不同测序流程之间仍然具有高度可重复性。对两种测序流程之间表达差异的进一步研究表明,文库制备方法是最大的误差来源,尽管确定这种差异来源的分析尚无定论。与测序流程之间表达的可重复性一致,我们发现不同的流程检测到超过80%的相同差异表达基因,这些基因在不同条件下具有较大的表达差异。因此,来自不同来源的细菌RNA-seq数据可以有效比较,有助于利用大量公开可用的RNA-seq数据集加深对细菌行为和生理学的理解。重要性RNA测序(RNA-seq)给生物学带来了革命性变化,但RNA-seq工作流程中的许多步骤可能会引入差异,从而可能影响可重复性。虽然RNA-seq在真核生物中的可重复性已经得到了深入研究,但对于在细菌RNA-seq数据中引入差异和偏差的流程和工作流程了解较少。通过利用来自不同实验室的囊性纤维化模型中的转录组,并使用不同的测序流程进行测序,我们直接评估了细菌RNA-seq差异的来源。RNA-seq数据具有高度可重复性,最大的差异来自测序流程,特别是文库制备。不同的测序流程检测到重叠的差异表达基因,尤其是那些在不同条件下具有较大表达差异的基因。这项研究证实,制备和测序细菌RNA文库的不同方法能够捕获可比的转录谱,支持研究人员利用多样的RNA-seq数据集推动其科学研究的能力。