Monleon-Getino Toni, Frias-Lopez Jorge
Section of Statistics (Department of Genetics, Microbiology, and Statistics) University of Barcelona Barcelona Spain.
BIOST3 GRBIO (Research Group in Biostatistics and Bioinformatics) Barcelona Spain.
Ecol Evol. 2020 Nov 5;10(23):13382-13394. doi: 10.1002/ece3.6941. eCollection 2020 Dec.
Metatranscriptome analysis or the analysis of the expression profiles of whole microbial communities has the additional challenge of dealing with a complex system with dozens of different organisms expressing genes simultaneously. An underlying issue for virtually all metatranscriptomic sequencing experiments is how to allocate the limited sequencing budget while guaranteeing that the libraries have sufficient depth to cover the breadth of expression of the community. Estimating the required sequencing depth to effectively sample the target metatranscriptome using RNA-seq is an essential first step to obtain robust results in subsequent analysis and to avoid overexpansion, once the information contained in the library reaches saturation. Here, we present a method to calculate the sequencing effort using a simulated series of metatranscriptomic/metagenomic matrices. This method is based on an extrapolation rarefaction curve using a Weibull growth model to estimate the maximum number of observed genes as a function of sequencing depth. This approach allowed us to compute the effort at different confidence intervals and to obtain an approximate a priori effort based on an initial fraction of sequences. The analytical pipeline presented here may be successfully used for the in-depth and time-effective characterization of complex microbial communities, representing a useful tool for the microbiome research community.
宏转录组分析,即对整个微生物群落的表达谱进行分析,还面临着额外的挑战,那就是要处理一个复杂的系统,其中几十种不同的生物体同时表达基因。几乎所有宏转录组测序实验的一个潜在问题是,如何在保证文库有足够深度以覆盖群落表达广度的同时,分配有限的测序预算。使用RNA测序来估计有效采样目标宏转录组所需的测序深度,是在后续分析中获得可靠结果并避免过度扩展(一旦文库中的信息达到饱和)的关键第一步。在这里,我们提出了一种使用一系列模拟宏转录组/宏基因组矩阵来计算测序工作量的方法。该方法基于使用威布尔生长模型的外推稀疏曲线,以估计作为测序深度函数的观察到的基因的最大数量。这种方法使我们能够在不同的置信区间计算工作量,并根据序列的初始部分获得近似的先验工作量。这里介绍的分析流程可成功用于复杂微生物群落的深入和高效表征,是微生物组研究领域的一个有用工具。