无参考的下一代测序数据质量控制。

Quality control of next-generation sequencing data without a reference.

机构信息

Edinburgh Genomics, Ashworth Laboratories, University of Edinburgh Edinburgh, UK.

Edinburgh Genomics, Ashworth Laboratories, University of Edinburgh Edinburgh, UK ; Institute of Evolutionary Biology, Ashworth Laboratories, University of Edinburgh Edinburgh, UK.

出版信息

Front Genet. 2014 May 6;5:111. doi: 10.3389/fgene.2014.00111. eCollection 2014.

DOI:10.3389/fgene.2014.00111

PMID:24834071

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4018527/

Abstract

Next-generation sequencing (NGS) technologies have dramatically expanded the breadth of genomics. Genome-scale data, once restricted to a small number of biomedical model organisms, can now be generated for virtually any species at remarkable speed and low cost. Yet non-model organisms often lack a suitable reference to map sequence reads against, making alignment-based quality control (QC) of NGS data more challenging than cases where a well-assembled genome is already available. Here we show that by generating a rapid, non-optimized draft assembly of raw reads, it is possible to obtain reliable and informative QC metrics, thus removing the need for a high quality reference. We use benchmark datasets generated from control samples across a range of genome sizes to illustrate that QC inferences made using draft assemblies are broadly equivalent to those made using a well-established reference, and describe QC tools routinely used in our production facility to assess the quality of NGS data from non-model organisms.

摘要

下一代测序 (NGS) 技术极大地扩展了基因组学的广度。基因组规模的数据，曾经仅限于少数生物医学模式生物，现在可以以惊人的速度和低成本为几乎任何物种生成。然而，非模式生物通常缺乏合适的参考来映射序列读数，这使得基于对齐的 NGS 数据质量控制 (QC) 比已经有良好组装基因组的情况更具挑战性。在这里，我们展示了通过生成快速、非优化的原始读数草案组装，可以获得可靠和有信息的 QC 指标，从而无需高质量的参考。我们使用来自一系列基因组大小的对照样本生成的基准数据集来说明，使用草案组装进行的 QC 推断与使用成熟参考进行的推断大致相当，并描述了我们在生产设施中常规使用的 QC 工具，以评估非模式生物的 NGS 数据的质量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b35d/4018527/55cc7e68bcf6/fgene-05-00111-g0001.jpg

相似文献

Quality control of next-generation sequencing data without a reference.无参考的下一代测序数据质量控制。

Front Genet. 2014 May 6;5:111. doi: 10.3389/fgene.2014.00111. eCollection 2014.

QC-Chain: fast and holistic quality control method for next-generation sequencing data.QC-Chain：一种用于下一代测序数据的快速且全面的质量控制方法。

PLoS One. 2013;8(4):e60234. doi: 10.1371/journal.pone.0060234. Epub 2013 Apr 2.

Software for pre-processing Illumina next-generation sequencing short read sequences.用于预处理Illumina下一代测序短读序列的软件。

Source Code Biol Med. 2014 May 3;9:8. doi: 10.1186/1751-0473-9-8. eCollection 2014.

Evaluating long-read de novo assembly tools for eukaryotic genomes: insights and considerations.评估真核生物基因组的长读长从头组装工具：见解与考虑。

Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad100. Epub 2023 Nov 24.

Optimized Illumina PCR-free library preparation for bacterial whole genome sequencing and analysis of factors influencing de novo assembly.用于细菌全基因组测序的优化Illumina无PCR文库制备及影响从头组装的因素分析

BMC Res Notes. 2016 May 12;9:269. doi: 10.1186/s13104-016-2072-9.

Fine de novo sequencing of a fungal genome using only SOLiD short read data: verification on Aspergillus oryzae RIB40.仅使用 SOLiD 短读数据进行真菌基因组从头测序：以米曲霉 RIB40 为例的验证。

PLoS One. 2013 May 7;8(5):e63673. doi: 10.1371/journal.pone.0063673. Print 2013.

Comparison of ONT and CCS sequencing technologies on the polyploid genome of a medicinal plant showed that high error rate of ONT reads are not suitable for self-correction.对一种药用植物多倍体基因组上的纳米孔测序（ONT）技术和环形一致序列（CCS）测序技术进行比较后发现，ONT读数的高错误率不适用于自我校正。

Chin Med. 2022 Aug 9;17(1):94. doi: 10.1186/s13020-022-00644-1.

Assessing the benefits of using mate-pairs to resolve repeats in de novo short-read prokaryotic assemblies.评估使用 Mate-Pairs 解决从头组装的短读 prokaryotic 重复的好处。

BMC Bioinformatics. 2011 Apr 13;12:95. doi: 10.1186/1471-2105-12-95.

Combining independent de novo assemblies optimizes the coding transcriptome for nonconventional model eukaryotic organisms.结合独立的从头组装可优化非传统模式真核生物的编码转录组。

BMC Bioinformatics. 2016 Dec 9;17(1):525. doi: 10.1186/s12859-016-1406-x.

Refinement of Draft Genome Assemblies of Pigeonpea ().木豆基因组草图组装的优化（）。（原文括号部分内容缺失完整信息）

Front Genet. 2020 Dec 15;11:607432. doi: 10.3389/fgene.2020.607432. eCollection 2020.

引用本文的文献

Identification and validation of hub genes for kidney renal clear cell carcinoma treated with metformin and everolimus combination therapy.二甲双胍和依维莫司联合治疗肾透明细胞癌的枢纽基因鉴定与验证

Transl Cancer Res. 2025 Jul 30;14(7):3943-3960. doi: 10.21037/tcr-2025-277. Epub 2025 Jul 24.

Streptomyces venezuelae uses secreted chitinases and a designated ABC transporter to support the competitive saprophytic catabolism of chitin.委内瑞拉链霉菌利用分泌的几丁质酶和一种特定的ABC转运蛋白来支持几丁质的竞争性腐生分解代谢。

PLoS Biol. 2025 Aug 6;23(8):e3003292. doi: 10.1371/journal.pbio.3003292. eCollection 2025 Aug.

Treatment-Induced Gene Expression Changes in Metastatic Renal Cell Carcinoma: Insights from a Syngeneic Mouse Model.转移性肾细胞癌中治疗诱导的基因表达变化：来自同基因小鼠模型的见解

Curr Oncol. 2025 Jul 8;32(7):391. doi: 10.3390/curroncol32070391.

Mammary tissue microbiome analysis in PyMT mice reveals Methylobacteria as a commensal organism with potential therapeutic applications.PyMT小鼠乳腺组织微生物组分析显示甲基杆菌是一种具有潜在治疗应用价值的共生生物。

Transl Oncol. 2025 Sep;59:102451. doi: 10.1016/j.tranon.2025.102451. Epub 2025 Jun 25.

The Use of AI for Phenotype-Genotype Mapping.人工智能在表型-基因型映射中的应用。

Methods Mol Biol. 2025;2952:369-410. doi: 10.1007/978-1-0716-4690-8_21.

A Guide to Basic RNA Sequencing Data Processing and Transcriptomic Analysis.基本RNA测序数据处理与转录组分析指南

Bio Protoc. 2025 May 5;15(9):e5295. doi: 10.21769/BioProtoc.5295.

Snord67 promotes breast cancer metastasis by guiding U6 modification and modulating the splicing landscape.小分子核仁RNA67（Snord67）通过引导U6修饰和调节剪接图谱促进乳腺癌转移。

Nat Commun. 2025 May 2;16(1):4118. doi: 10.1038/s41467-025-59406-w.

Evolving a plant-beneficial bacterium in soil vs. nutrient-rich liquid culture has contrasting effects on in-soil fitness.在土壤中与在营养丰富的液体培养基中培养对植物有益的细菌，会对其在土壤中的适应性产生截然不同的影响。

Appl Environ Microbiol. 2025 Apr 23;91(4):e0208524. doi: 10.1128/aem.02085-24. Epub 2025 Mar 11.

high-content screening reveals miR-429 as a protective molecule in photoreceptor degeneration.高内涵筛选揭示miR-429是光感受器变性中的一种保护分子。

Mol Ther Nucleic Acids. 2024 Dec 22;36(1):102434. doi: 10.1016/j.omtn.2024.102434. eCollection 2025 Mar 11.

Prediction of strain level phage-host interactions across the Escherichia genus using only genomic information.仅使用基因组信息预测整个大肠埃希氏菌属中噬菌体-宿主相互作用的应变水平。

Nat Microbiol. 2024 Nov;9(11):2847-2861. doi: 10.1038/s41564-024-01832-5. Epub 2024 Oct 31.

本文引用的文献

Why assembling plant genome sequences is so challenging.为什么组装植物基因组序列如此具有挑战性。

Biology (Basel). 2012 Sep 18;1(2):439-59. doi: 10.3390/biology1020439.

SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads.SOAPdenovo-Trans：基于短 RNA-Seq 数据的 de novo 转录组组装。

Bioinformatics. 2014 Jun 15;30(12):1660-6. doi: 10.1093/bioinformatics/btu077. Epub 2014 Feb 13.

Exploring genome characteristics and sequence quality without a reference.无参考基因组特征和序列质量探索。

Bioinformatics. 2014 May 1;30(9):1228-35. doi: 10.1093/bioinformatics/btu023. Epub 2014 Jan 17.

Blobology: exploring raw genome data for contaminants, symbionts and parasites using taxon-annotated GC-coverage plots.块状体学：使用分类群注释的 GC 覆盖图探索原始基因组数据中的污染物、共生体和寄生虫。

Front Genet. 2013 Nov 29;4:237. doi: 10.3389/fgene.2013.00237. eCollection 2013.

Informed and automated k-mer size selection for genome assembly.基于信息和自动化的基因组组装的 k-mer 大小选择。

Bioinformatics. 2014 Jan 1;30(1):31-7. doi: 10.1093/bioinformatics/btt310. Epub 2013 Jun 3.

SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler.SOAPdenovo2：一种经验丰富的、内存效率高的短读长从头组装器。

Gigascience. 2012 Dec 27;1(1):18. doi: 10.1186/2047-217X-1-18.

QC-Chain: fast and holistic quality control method for next-generation sequencing data.QC-Chain：一种用于下一代测序数据的快速且全面的质量控制方法。

PLoS One. 2013;8(4):e60234. doi: 10.1371/journal.pone.0060234. Epub 2013 Apr 2.

CD-HIT: accelerated for clustering the next-generation sequencing data.CD-HIT：用于加速下一代测序数据聚类的工具。

Bioinformatics. 2012 Dec 1;28(23):3150-2. doi: 10.1093/bioinformatics/bts565. Epub 2012 Oct 11.

Estimation of sequencing error rates in short reads.短读测序错误率的估计。

BMC Bioinformatics. 2012 Jul 30;13:185. doi: 10.1186/1471-2105-13-185.

A platform-independent method for detecting errors in metagenomic sequencing data: DRISEE.一种用于检测宏基因组测序数据中错误的与平台无关的方法：DRISEE。

PLoS Comput Biol. 2012;8(6):e1002541. doi: 10.1371/journal.pcbi.1002541. Epub 2012 Jun 7.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

无参考的下一代测序数据质量控制。

Quality control of next-generation sequencing data without a reference.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献