Suppr超能文献

RNAseq Fastq 文件中 DNA k-mer 计数的层次聚类可识别样本异质性。

Hierarchical Clustering of DNA k-mer Counts in RNAseq Fastq Files Identifies Sample Heterogeneities.

机构信息

Department of Anaesthesiology, HELIOS University Hospital Wuppertal, University of Witten/Herdecke, Heusnerstr. 40, 42283 Wuppertal, Germany.

Institut fur Virologie, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany.

出版信息

Int J Mol Sci. 2018 Nov 21;19(11):3687. doi: 10.3390/ijms19113687.

Abstract

We apply hierarchical clustering (HC) of DNA k-mer counts on multiple Fastq files. The tree structures produced by HC may reflect experimental groups and thereby indicate experimental effects, but clustering of preparation groups indicates the presence of batch effects. Hence, HC of DNA k-mer counts may serve as a diagnostic device. In order to provide a simple applicable tool we implemented sequential analysis of Fastq reads with low memory usage in an R package (seqTools) available on Bioconductor. The approach is validated by analysis of Fastq file batches containing RNAseq data. Analysis of three Fastq batches downloaded from ArrayExpress indicated experimental effects. Analysis of RNAseq data from two cell types (dermal fibroblasts and Jurkat cells) sequenced in our facility indicate presence of batch effects. The observed batch effects were also present in reads mapped to the human genome and also in reads filtered for high quality (Phred > 30). We propose, that hierarchical clustering of DNA k-mer counts provides an unspecific diagnostic tool for RNAseq experiments. Further exploration is required once samples are identified as outliers in HC derived trees.

摘要

我们应用 DNA k- -mer 计数的层次聚类 (HC) 对多个 Fastq 文件进行分析。HC 产生的树结构可能反映了实验分组,从而表明存在实验效应,但制备分组的聚类则表明存在批次效应。因此,DNA k- -mer 计数的 HC 可以作为一种诊断工具。为了提供一个简单适用的工具,我们在 Bioconductor 上的 R 包(seqTools)中实现了低内存使用的 Fastq 读取的顺序分析。该方法通过分析包含 RNAseq 数据的 Fastq 文件批次得到验证。对从 ArrayExpress 下载的三个 Fastq 批次的分析表明存在实验效应。对我们实验室测序的两种细胞类型(真皮成纤维细胞和 Jurkat 细胞)的 RNAseq 数据的分析表明存在批次效应。在映射到人类基因组的读取中以及在过滤得到高质量(Phred>30)的读取中也观察到了批次效应。我们提出,DNA k- -mer 计数的层次聚类为 RNAseq 实验提供了一种非特异性的诊断工具。一旦在 HC 衍生树中确定样本为异常值,就需要进一步探索。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验