Suppr超能文献

使用高度多重化 RNA-seq 方法进行大规模转录组研究的文库设计和偏差校正指南。

Guide for library design and bias correction for large-scale transcriptome studies using highly multiplexed RNAseq methods.

机构信息

Department of Biosciences and Nutrition, Karolinska Institutet, 14183, Huddinge, Sweden.

Department of Women's and Children's Health, Karolinska Institutet, 17177, Stockholm, Sweden.

出版信息

BMC Bioinformatics. 2019 Aug 13;20(1):418. doi: 10.1186/s12859-019-3017-9.

Abstract

BACKGROUND

Standard RNAseq methods using bulk RNA and recent single-cell RNAseq methods use DNA barcodes to identify samples and cells, and the barcoded cDNAs are pooled into a library pool before high throughput sequencing. In cases of single-cell and low-input RNAseq methods, the library is further amplified by PCR after the pooling. Preparation of hundreds or more samples for a large study often requires multiple library pools. However, sometimes correlation between expression profiles among the libraries is low and batch effect biases make integration of data between library pools difficult.

RESULTS

We investigated 166 technical replicates in 14 RNAseq libraries made using the STRT method. The patterns of the library biases differed by genes, and uneven library yields were associated with library biases. The former bias was corrected using the NBGLM-LBC algorithm, which we present in the current study. The latter bias could not be corrected directly, but could be solved by omitting libraries with particularly low yields. A simulation experiment suggested that the library bias correction using NBGLM-LBC requires a consistent sample layout. The NBGLM-LBC correction method was applied to an expression profile for a cohort study of childhood acute respiratory illness, and the library biases were resolved.

CONCLUSIONS

The R source code for the library bias correction named NBGLM-LBC is available at https://shka.github.io/NBGLM-LBC and https://shka.bitbucket.io/NBGLM-LBC . This method is applicable to correct the library biases in various studies that use highly multiplexed sequencing-based profiling methods with a consistent sample layout with samples to be compared (e.g., "cases" and "controls") equally distributed in each library.

摘要

背景

使用批量 RNA 的标准 RNAseq 方法和最近的单细胞 RNAseq 方法使用 DNA 条码来识别样本和细胞,并且条码化的 cDNA 在高通量测序之前被汇集到文库池中。在单细胞和低输入 RNAseq 方法的情况下,在汇集后,文库通过 PCR 进一步扩增。对于大型研究,通常需要准备数百个或更多的样本,因此需要准备多个文库池。然而,有时文库之间的表达谱之间的相关性较低,并且批处理效应偏差使得难以在文库池之间整合数据。

结果

我们研究了使用 STRT 方法制备的 14 个 RNAseq 文库中的 166 个技术重复。文库偏差的模式因基因而异,并且不均匀的文库产量与文库偏差相关。使用我们在当前研究中提出的 NBGLM-LBC 算法可以纠正前一种偏差。后者的偏差不能直接纠正,但可以通过省略具有特别低产量的文库来解决。模拟实验表明,使用 NBGLM-LBC 的文库偏差校正需要一致的样本布局。NBGLM-LBC 校正方法应用于儿童急性呼吸道疾病队列研究的表达谱,解决了文库偏差问题。

结论

名为 NBGLM-LBC 的文库偏差校正的 R 源代码可在 https://shka.github.io/NBGLM-LBChttps://shka.bitbucket.io/NBGLM-LBC 获得。该方法适用于校正具有一致样本布局且样本(例如“病例”和“对照”)在每个文库中均匀分布的各种研究中的文库偏差。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/695c/6693229/8eebb3b3c00a/12859_2019_3017_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验