Suppr超能文献

RNA-seq 实验中人工重复生成策略的比较。

A comparison of strategies for generating artificial replicates in RNA-seq experiments.

机构信息

Institute for Animal Breeding and Genetics, University of Veterinary Medicine Hannover, Foundation, Hannover, Germany.

Institute for Parasitology, University of Veterinary Medicine Hannover, Foundation, Hannover, Germany.

出版信息

Sci Rep. 2022 May 3;12(1):7170. doi: 10.1038/s41598-022-11302-9.

Abstract

Due to the overall high costs, technical replicates are usually omitted in RNA-seq experiments, but several methods exist to generate them artificially. Bootstrapping reads from FASTQ-files has recently been used in the context of other NGS analyses and can be used to generate artificial technical replicates. Bootstrapping samples from the columns of the expression matrix has already been used for DNA microarray data and generates a new artificial replicate of the whole experiment. Mixing data of individual samples has been used for data augmentation in machine learning. The aim of this comparison is to evaluate which of these strategies are best suited to study the reproducibility of differential expression and gene-set enrichment analysis in an RNA-seq experiment. To study the approaches under controlled conditions, we performed a new RNA-seq experiment on gene expression changes upon virus infection compared to untreated control samples. In order to compare the approaches for artificial replicates, each of the samples was sequenced twice, i.e. as true technical replicates, and differential expression analysis and GO term enrichment analysis was conducted separately for the two resulting data sets. Although we observed a high correlation between the results from the two replicates, there are still many genes and GO terms that would be selected from one replicate but not from the other. Cluster analyses showed that artificial replicates generated by bootstrapping reads produce it p values and fold changes that are close to those obtained from the true data sets. Results generated from artificial replicates with the approaches of column bootstrap or mixing observations were less similar to the results from the true replicates. Furthermore, the overlap of results among replicates generated by column bootstrap or mixing observations was much stronger than among the true replicates. Artificial technical replicates generated by bootstrapping sequencing reads from FASTQ-files are better suited to study the reproducibility of results from differential expression and GO term enrichment analysis in RNA-seq experiments than column bootstrap or mixing observations. However, FASTQ-bootstrapping is computationally more expensive than the other two approaches. The FASTQ-bootstrapping may be applicable to other applications of high-throughput sequencing.

摘要

由于总体成本较高,RNA-seq 实验通常会省略技术重复,但存在几种人为生成技术重复的方法。最近,在其他 NGS 分析的背景下,从 FASTQ 文件中引导读取的方法已被用于生成人工技术重复。从表达矩阵的列中引导样本已经用于 DNA 微阵列数据,并生成整个实验的新人工重复。混合单个样本的数据已用于机器学习中的数据增强。本比较的目的是评估这些策略中哪一种最适合研究 RNA-seq 实验中差异表达和基因集富集分析的可重复性。为了在受控条件下研究这些方法,我们进行了一项新的 RNA-seq 实验,比较了病毒感染与未处理对照样品的基因表达变化。为了比较人工重复的方法,每个样本都测序了两次,即作为真正的技术重复,然后分别对两个结果数据集进行差异表达分析和 GO 术语富集分析。尽管我们观察到两个重复结果之间的相关性很高,但仍然有许多基因和 GO 术语会从一个重复中选择,但不会从另一个重复中选择。聚类分析表明,从真实数据集获得的 p 值和倍数变化接近从真正数据集中获得的那些。通过读取引导生成的人工重复生成的结果与通过列引导或混合观察生成的结果的相似性较低。此外,通过列引导或混合观察生成的人工重复之间的结果重叠比真实重复之间的重叠要强得多。从 FASTQ 文件中引导测序读取生成的人工技术重复更适合研究 RNA-seq 实验中差异表达和 GO 术语富集分析结果的可重复性,而不是列引导或混合观察。但是,FASTQ 引导比其他两种方法计算成本更高。FASTQ 引导可能适用于其他高通量测序应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4d9/9065086/04ad07d130d3/41598_2022_11302_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验