PIPETS：一种基于统计学的、不依赖基因注释的分析方法，用于利用3'端测序研究细菌终止。

PIPETS: A statistically informed, gene-annotation agnostic analysis method to study bacterial termination using 3'-end sequencing.

作者信息

Furumo Quinlan, Meyer Michelle

机构信息

Department of Biology, Boston College, Chestnut Hill, MA, 02135, United States.

出版信息

bioRxiv. 2024 Nov 5:2024.03.18.585559. doi: 10.1101/2024.03.18.585559.

DOI:10.1101/2024.03.18.585559

PMID:38562853

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10983905/

Abstract

BACKGROUND

Over the last decade the drop in short-read sequencing costs has allowed experimental techniques utilizing sequencing to address specific biological questions to proliferate, oVentimes outpacing standardized or effective analysis approaches for the data generated. There are growing amounts of bacterial 3'-end sequencing data, yet there is currently no commonly accepted analysis methodology for this datatype. Most data analysis approaches are somewhat and, despite the presence of substantial signal within annotated genes, focus on genomic regions outside the annotated genes (e.g. 3' or 5' UTRs). Furthermore, the lack of consistent systematic analysis approaches, as well as the absence of genome-wide ground truth data, make it impossible to compare conclusions generated by different labs, using different organisms.

RESULTS

We present PIPETS, (oisson dentification of aks from erm-eq data), an R package available on Bioconductor that provides a novel analysis method for 3'-end sequencing data. PIPETS is a statistically informed, gene-annotation agnostic methodology. Across two different datasets from two different organisms, PIPETS identified significant 3'-end termination signal across a wider range of annotated genomic contexts than existing analysis approaches, suggesting that existing approaches may miss biologically relevant signal. Furthermore, assessment of the previously called 3'-end positions not captured by PIPETS showed that they were uniformly very low coverage.

CONCLUSIONS

PIPETS provides a broadly applicable placorm to explore and analyze 3'-end sequencing data sets from across different organisms. It requires only the 3'-end sequencing data, and is broadly accessible to non-expert users.

摘要

背景

在过去十年中，短读长测序成本的下降使得利用测序技术解决特定生物学问题的实验技术激增，有时甚至超过了针对所产生数据的标准化或有效分析方法。细菌3'端测序数据的数量在不断增加，但目前对于这种数据类型尚无普遍接受的分析方法。大多数数据分析方法都存在一定局限性，并且尽管在注释基因内存在大量信号，但这些方法仍侧重于注释基因之外的基因组区域（例如3'或5'非翻译区）。此外，缺乏一致的系统分析方法以及全基因组的真实数据，使得不同实验室使用不同生物体得出的结论无法进行比较。

结果

我们展示了PIPETS（从末端均等化数据中识别末端的泊松方法），这是一个可在Bioconductor上获取的R包，它为3'端测序数据提供了一种新颖的分析方法。PIPETS是一种基于统计学的、不依赖基因注释的方法。在来自两种不同生物体的两个不同数据集中，与现有分析方法相比，PIPETS在更广泛的注释基因组背景下识别出了显著的3'端终止信号，这表明现有方法可能遗漏了生物学相关信号。此外，对PIPETS未捕获的先前确定的3'端位置进行评估发现，它们的覆盖度普遍非常低。