Furumo Quinlan, Meyer Michelle M
Boston College, Department of Biology, Chestnut Hill MA 02467.
bioRxiv. 2025 Jun 12:2025.06.12.658996. doi: 10.1101/2025.06.12.658996.
3-prime end sequencing (3'-seq) is a high-throughput sequencing technique that is used to specifically quantify the changes in 3'-end formation of transcripts in bacterial cells, which is increasingly being utilized to address fundamental questions regarding transcription termination and pausing across a range of different bacterial species. However, the growing number of 3'-seq studies is accompanied by an increase in study-specific 3'-seq data analysis approaches. Thus, differences in a number of factors including: experimental design, data collection approaches, analysis methodologies, and interpretation decisions, make it challenging to confidently compare results derived from different studies, even those that were performed on the same organism. To assess the potential severity of these discrepancies, we used PIPETS, a statistically robust and genome-annotation agnostic 3'-seq analysis package, to study 3'-seq data sets from three different groups collected under similar conditions. By using a consistent analysis and results interpretation approacequaionh, we identified large disparities in the characteristics of the raw 3'-seq data between each of the studies, despite all three studies using the same strain and very similar reported experimental conditions. Additionally, we found strand-specific inconsistencies, with some data sets having reference strand 3'-seq read coverage distributions that differed greatly from the complement strand within the same replicate. Finally, when the 3'-seq distribution profiles of the three studies are compared to studies from four additional bacteria, we identified 3'-seq results clustering patterns that are not explained by phylogenetic similarity between organisms. With the large differences seen between data sets from the same organism as well as the inconsistencies seen between replicates from the same data sets, we urge the field to reconsider the assumptions around 3'-seq data homogeneity and move towards consistent analysis approaches, and cautious interpretation of the data.
3'端测序(3'-seq)是一种高通量测序技术,用于特异性定量细菌细胞中转录本3'端形成的变化,该技术越来越多地被用于解决一系列不同细菌物种中有关转录终止和暂停的基本问题。然而,随着3'-seq研究数量的不断增加,特定研究的3'-seq数据分析方法也在增多。因此,包括实验设计、数据收集方法、分析方法和解释决策等诸多因素的差异,使得即使是对同一生物体进行的不同研究结果,也难以进行可靠的比较。为了评估这些差异的潜在严重程度,我们使用了PIPETS,这是一个统计稳健且与基因组注释无关的3'-seq分析软件包,来研究在相似条件下收集的来自三个不同组的3'-seq数据集。通过使用一致的分析和结果解释方法,我们发现尽管所有三项研究都使用了相同的菌株且报告的实验条件非常相似,但每项研究的原始3'-seq数据特征仍存在巨大差异。此外,我们还发现了链特异性的不一致性,一些数据集的参考链3'-seq读数覆盖分布与同一重复内的互补链有很大差异。最后,当将这三项研究的3'-seq分布图谱与另外四种细菌的研究进行比较时,我们发现3'-seq结果的聚类模式无法用生物体之间的系统发育相似性来解释。鉴于来自同一生物体的数据集之间存在巨大差异,以及同一数据集的重复之间存在不一致性,我们敦促该领域重新考虑关于3'-seq数据同质性的假设,并朝着一致的分析方法发展,并谨慎解释数据。