• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

RNAseq Fastq 文件中 DNA k-mer 计数的层次聚类可识别样本异质性。

Hierarchical Clustering of DNA k-mer Counts in RNAseq Fastq Files Identifies Sample Heterogeneities.

机构信息

Department of Anaesthesiology, HELIOS University Hospital Wuppertal, University of Witten/Herdecke, Heusnerstr. 40, 42283 Wuppertal, Germany.

Institut fur Virologie, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany.

出版信息

Int J Mol Sci. 2018 Nov 21;19(11):3687. doi: 10.3390/ijms19113687.

DOI:10.3390/ijms19113687
PMID:30469355
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6274891/
Abstract

We apply hierarchical clustering (HC) of DNA k-mer counts on multiple Fastq files. The tree structures produced by HC may reflect experimental groups and thereby indicate experimental effects, but clustering of preparation groups indicates the presence of batch effects. Hence, HC of DNA k-mer counts may serve as a diagnostic device. In order to provide a simple applicable tool we implemented sequential analysis of Fastq reads with low memory usage in an R package (seqTools) available on Bioconductor. The approach is validated by analysis of Fastq file batches containing RNAseq data. Analysis of three Fastq batches downloaded from ArrayExpress indicated experimental effects. Analysis of RNAseq data from two cell types (dermal fibroblasts and Jurkat cells) sequenced in our facility indicate presence of batch effects. The observed batch effects were also present in reads mapped to the human genome and also in reads filtered for high quality (Phred > 30). We propose, that hierarchical clustering of DNA k-mer counts provides an unspecific diagnostic tool for RNAseq experiments. Further exploration is required once samples are identified as outliers in HC derived trees.

摘要

我们应用 DNA k- -mer 计数的层次聚类 (HC) 对多个 Fastq 文件进行分析。HC 产生的树结构可能反映了实验分组,从而表明存在实验效应,但制备分组的聚类则表明存在批次效应。因此,DNA k- -mer 计数的 HC 可以作为一种诊断工具。为了提供一个简单适用的工具,我们在 Bioconductor 上的 R 包(seqTools)中实现了低内存使用的 Fastq 读取的顺序分析。该方法通过分析包含 RNAseq 数据的 Fastq 文件批次得到验证。对从 ArrayExpress 下载的三个 Fastq 批次的分析表明存在实验效应。对我们实验室测序的两种细胞类型(真皮成纤维细胞和 Jurkat 细胞)的 RNAseq 数据的分析表明存在批次效应。在映射到人类基因组的读取中以及在过滤得到高质量(Phred>30)的读取中也观察到了批次效应。我们提出,DNA k- -mer 计数的层次聚类为 RNAseq 实验提供了一种非特异性的诊断工具。一旦在 HC 衍生树中确定样本为异常值,就需要进一步探索。

相似文献

1
Hierarchical Clustering of DNA k-mer Counts in RNAseq Fastq Files Identifies Sample Heterogeneities.RNAseq Fastq 文件中 DNA k-mer 计数的层次聚类可识别样本异质性。
Int J Mol Sci. 2018 Nov 21;19(11):3687. doi: 10.3390/ijms19113687.
2
BEETL-fastq: a searchable compressed archive for DNA reads.BEETL-fastq:一种用于DNA读数的可搜索压缩存档。
Bioinformatics. 2014 Oct;30(19):2796-801. doi: 10.1093/bioinformatics/btu387. Epub 2014 Jun 20.
3
AfterQC: automatic filtering, trimming, error removing and quality control for fastq data.QC之后:对fastq数据进行自动过滤、修剪、错误去除和质量控制。
BMC Bioinformatics. 2017 Mar 14;18(Suppl 3):80. doi: 10.1186/s12859-017-1469-3.
4
A FASTQ compressor based on integer-mapped k-mer indexing for biologist.一种基于整数映射k-mer索引的面向生物学家的FASTQ压缩器。
Gene. 2016 Mar 15;579(1):75-81. doi: 10.1016/j.gene.2015.12.053. Epub 2015 Dec 30.
5
fastQ_brew: module for analysis, preprocessing, and reformatting of FASTQ sequence data.fastQ_brew:用于FASTQ序列数据的分析、预处理和重新格式化的模块。
BMC Res Notes. 2017 Jul 12;10(1):275. doi: 10.1186/s13104-017-2616-7.
6
DSK: k-mer counting with very low memory usage.DSK:使用极低内存进行 k-mer 计数。
Bioinformatics. 2013 Mar 1;29(5):652-3. doi: 10.1093/bioinformatics/btt020. Epub 2013 Jan 16.
7
Generation of artificial FASTQ files to evaluate the performance of next-generation sequencing pipelines.生成人工 FASTQ 文件以评估下一代测序管道的性能。
PLoS One. 2012;7(11):e49110. doi: 10.1371/journal.pone.0049110. Epub 2012 Nov 12.
8
Blue: correcting sequencing errors using consensus and context.蓝色:使用一致性和上下文来纠正测序错误。
Bioinformatics. 2014 Oct;30(19):2723-32. doi: 10.1093/bioinformatics/btu368. Epub 2014 Jun 11.
9
CIndex: compressed indexes for fast retrieval of FASTQ files.CIndex:用于快速检索FASTQ文件的压缩索引。
Bioinformatics. 2022 Jan 3;38(2):335-343. doi: 10.1093/bioinformatics/btab655.
10
RNASeqBrowser: a genome browser for simultaneous visualization of raw strand specific RNAseq reads and UCSC genome browser custom tracks.RNA序列浏览器:一种用于同时可视化原始链特异性RNA序列读数和加州大学圣克鲁兹分校(UCSC)基因组浏览器自定义轨迹的基因组浏览器。
BMC Genomics. 2015 Mar 1;16(1):145. doi: 10.1186/s12864-015-1346-2.

引用本文的文献

1
Identification of Differential Expression Genes between Volume and Pressure Overloaded Hearts Based on Bioinformatics Analysis.基于生物信息学分析鉴定容量和压力超负荷心脏中的差异表达基因。
Genes (Basel). 2022 Jul 19;13(7):1276. doi: 10.3390/genes13071276.
2
Screening the Significant Hub Genes by Comparing Tumor Cells, Normoxic and Hypoxic Glioblastoma Stem-like Cell Lines Using Co-Expression Analysis in Glioblastoma.通过共表达分析比较肿瘤细胞、常氧和低氧胶质母细胞瘤干细胞样细胞系筛选胶质母细胞瘤中的重要枢纽基因。
Genes (Basel). 2022 Mar 15;13(3):518. doi: 10.3390/genes13030518.
3
Application of the Interaction between Tissue Immunohistochemistry Staining and Clinicopathological Factors for Evaluating the Risk of Oral Cancer Progression by Hierarchical Clustering Analysis: A Case-Control Study in a Taiwanese Population.

本文引用的文献

1
Alignment-Free Sequence Analysis and Applications.无比对序列分析及其应用
Annu Rev Biomed Data Sci. 2018 Jul;1:93-114. doi: 10.1146/annurev-biodatasci-080917-013431. Epub 2018 Apr 25.
2
A benchmark study of k-mer counting methods for high-throughput sequencing.用于高通量测序的 k-mer 计数方法的基准研究。
Gigascience. 2018 Dec 1;7(12):giy125. doi: 10.1093/gigascience/giy125.
3
Alignment-free sequence comparison: benefits, applications, and tools.无比对信息的序列比对:优势、应用和工具。
应用组织免疫组化染色与临床病理因素的相互作用通过层次聚类分析评估口腔癌进展风险:台湾人群的病例对照研究
Diagnostics (Basel). 2021 May 21;11(6):925. doi: 10.3390/diagnostics11060925.
4
Natural compounds attenuate heavy metal-induced PC12 cell damage.天然化合物可减轻重金属诱导的 PC12 细胞损伤。
J Int Med Res. 2020 Jun;48(6):300060520930847. doi: 10.1177/0300060520930847.
5
Age, gender and UV-exposition related effects on gene expression in in vivo aged short term cultivated human dermal fibroblasts.年龄、性别和紫外线暴露对体内老化的短期培养人皮肤成纤维细胞基因表达的相关影响。
PLoS One. 2017 May 5;12(5):e0175657. doi: 10.1371/journal.pone.0175657. eCollection 2017.
6
How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use?RNA测序实验需要多少生物学重复,以及应该使用哪种差异表达工具?
RNA. 2016 Jun;22(6):839-51. doi: 10.1261/rna.053959.115. Epub 2016 Mar 28.
Genome Biol. 2017 Oct 3;18(1):186. doi: 10.1186/s13059-017-1319-7.
4
Age, gender and UV-exposition related effects on gene expression in in vivo aged short term cultivated human dermal fibroblasts.年龄、性别和紫外线暴露对体内老化的短期培养人皮肤成纤维细胞基因表达的相关影响。
PLoS One. 2017 May 5;12(5):e0175657. doi: 10.1371/journal.pone.0175657. eCollection 2017.
5
Inhibition of the Glycolytic Activator PFKFB3 in Endothelium Induces Tumor Vessel Normalization, Impairs Metastasis, and Improves Chemotherapy.抑制内皮细胞中的糖酵解激活剂PFKFB3可诱导肿瘤血管正常化、削弱转移并改善化疗。
Cancer Cell. 2016 Dec 12;30(6):968-985. doi: 10.1016/j.ccell.2016.10.006. Epub 2016 Nov 17.
6
Improved assembly of noisy long reads by k-mer validation.通过k-mer验证改进嘈杂长读段的组装。
Genome Res. 2016 Dec;26(12):1710-1720. doi: 10.1101/gr.209247.116. Epub 2016 Oct 7.
7
KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies.KAT:一个用于对二代测序数据集和基因组组装进行质量控制的K-mer分析工具包。
Bioinformatics. 2017 Feb 15;33(4):574-576. doi: 10.1093/bioinformatics/btw663.
8
NUDT2 Disruption Elevates Diadenosine Tetraphosphate (Ap4A) and Down-Regulates Immune Response and Cancer Promotion Genes.NUDT2基因缺失会升高二磷酸腺苷四磷酸(Ap4A)水平,并下调免疫反应和癌症促进相关基因。
PLoS One. 2016 May 4;11(5):e0154674. doi: 10.1371/journal.pone.0154674. eCollection 2016.
9
A survey of best practices for RNA-seq data analysis.RNA测序数据分析的最佳实践调查。
Genome Biol. 2016 Jan 26;17:13. doi: 10.1186/s13059-016-0881-8.
10
CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers.克拉克:使用判别性k-mer对宏基因组和基因组序列进行快速准确分类
BMC Genomics. 2015 Mar 25;16(1):236. doi: 10.1186/s12864-015-1419-2.