全基因组鸟枪法样本中测序读段数据的过滤与标准化

Filtration and Normalization of Sequencing Read Data in Whole-Metagenome Shotgun Samples.

作者信息

Chouvarine Philippe, Wiehlmann Lutz, Moran Losada Patricia, DeLuca David S, Tümmler Burkhard

机构信息

Department of Pediatrics, Baylor College of Medicine, Houston, Texas 77030, United States of America.

Clinical Research Group, 'Molecular Pathology of Cystic Fibrosis and Pseudomonas Genomics', OE 6710, Hannover Medical School, Hannover D-30625, Germany.

出版信息

PLoS One. 2016 Oct 19;11(10):e0165015. doi: 10.1371/journal.pone.0165015. eCollection 2016.

DOI:10.1371/journal.pone.0165015

PMID:27760173

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5070866/

Abstract

Ever-increasing affordability of next-generation sequencing makes whole-metagenome sequencing an attractive alternative to traditional 16S rDNA, RFLP, or culturing approaches for the analysis of microbiome samples. The advantage of whole-metagenome sequencing is that it allows direct inference of the metabolic capacity and physiological features of the studied metagenome without reliance on the knowledge of genotypes and phenotypes of the members of the bacterial community. It also makes it possible to overcome problems of 16S rDNA sequencing, such as unknown copy number of the 16S gene and lack of sufficient sequence similarity of the "universal" 16S primers to some of the target 16S genes. On the other hand, next-generation sequencing suffers from biases resulting in non-uniform coverage of the sequenced genomes. To overcome this difficulty, we present a model of GC-bias in sequencing metagenomic samples as well as filtration and normalization techniques necessary for accurate quantification of microbial organisms. While there has been substantial research in normalization and filtration of read-count data in such techniques as RNA-seq or Chip-seq, to our knowledge, this has not been the case for the field of whole-metagenome shotgun sequencing. The presented methods assume that complete genome references are available for most microorganisms of interest present in metagenomic samples. This is often a valid assumption in such fields as medical diagnostics of patient microbiota. Testing the model on two validation datasets showed four-fold reduction in root-mean-square error compared to non-normalized data in both cases. The presented methods can be applied to any pipeline for whole metagenome sequencing analysis relying on complete microbial genome references. We demonstrate that such pre-processing reduces the number of false positive hits and increases accuracy of abundance estimates.

摘要

下一代测序技术的成本不断降低，使得全基因组测序成为分析微生物组样本的一种有吸引力的替代传统16S rDNA、RFLP或培养方法。全基因组测序的优势在于，它可以直接推断所研究宏基因组的代谢能力和生理特征，而无需依赖细菌群落成员的基因型和表型知识。它还能够克服16S rDNA测序的问题，如16S基因拷贝数未知以及“通用”16S引物与某些目标16S基因缺乏足够的序列相似性。另一方面，下一代测序存在偏差，导致测序基因组的覆盖不均匀。为了克服这一困难，我们提出了一种在测序宏基因组样本时的GC偏差模型以及准确量化微生物所需的过滤和归一化技术。虽然在RNA-seq或Chip-seq等技术中，对读取计数数据的归一化和过滤已有大量研究，但据我们所知，在全基因组鸟枪法测序领域并非如此。所提出的方法假设宏基因组样本中存在的大多数感兴趣的微生物都有完整的基因组参考。在患者微生物群的医学诊断等领域，这通常是一个有效的假设。在两个验证数据集上对该模型进行测试表明，与未归一化数据相比，两种情况下均方根误差降低了四倍。所提出的方法可应用于任何依赖完整微生物基因组参考的全基因组测序分析流程。我们证明，这种预处理减少了假阳性命中的数量，并提高了丰度估计的准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4bc/5070866/1c0a5f5b7eb3/pone.0165015.g001.jpg

相似文献

Filtration and Normalization of Sequencing Read Data in Whole-Metagenome Shotgun Samples.全基因组鸟枪法样本中测序读段数据的过滤与标准化

PLoS One. 2016 Oct 19;11(10):e0165015. doi: 10.1371/journal.pone.0165015. eCollection 2016.

Quantitative Assessment of Shotgun Metagenomics and 16S rDNA Amplicon Sequencing in the Study of Human Gut Microbiome. shotgun 宏基因组学和 16S rDNA 扩增子测序在人类肠道微生物组研究中的定量评估

OMICS. 2018 Apr;22(4):248-254. doi: 10.1089/omi.2018.0013.

Microbial resolution of whole genome shotgun and 16S amplicon metagenomic sequencing using publicly available NEON data.使用公开的 NEON 数据对全基因组鸟枪法和 16S 扩增子宏基因组测序进行微生物解析。

PLoS One. 2020 Feb 13;15(2):e0228899. doi: 10.1371/journal.pone.0228899. eCollection 2020.

ViraPipe: scalable parallel pipeline for viral metagenome analysis from next generation sequencing reads.ViraPipe：用于从下一代测序读取中进行病毒宏基因组分析的可扩展并行管道。

Bioinformatics. 2018 Mar 15;34(6):928-935. doi: 10.1093/bioinformatics/btx702.

Bioinformatics for NGS-based metagenomics and the application to biogas research.基于 NGS 的宏基因组学的生物信息学及其在沼气研究中的应用。

J Biotechnol. 2017 Nov 10;261:10-23. doi: 10.1016/j.jbiotec.2017.08.012. Epub 2017 Aug 18.

CAIM: coverage-based analysis for identification of microbiome.CAIM：基于覆盖度的微生物组分析方法。

Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae424.

MG-RAST, a Metagenomics Service for Analysis of Microbial Community Structure and Function.MG-RAST，一种用于分析微生物群落结构和功能的宏基因组学服务。

Methods Mol Biol. 2016;1399:207-33. doi: 10.1007/978-1-4939-3369-3_13.

VITCOMIC2: visualization tool for the phylogenetic composition of microbial communities based on 16S rRNA gene amplicons and metagenomic shotgun sequencing.VITCOMIC2：基于16S rRNA基因扩增子和宏基因组鸟枪法测序的微生物群落系统发育组成可视化工具。

BMC Syst Biol. 2018 Mar 19;12(Suppl 2):30. doi: 10.1186/s12918-018-0545-2.

Intestinal microbiota domination under extreme selective pressures characterized by metagenomic read cloud sequencing and assembly.肠道微生物群落在具有宏基因组读段云测序和组装特征的极端选择压力下占主导地位。

BMC Bioinformatics. 2019 Dec 2;20(Suppl 16):585. doi: 10.1186/s12859-019-3073-1.

Analysis of the microbiome: Advantages of whole genome shotgun versus 16S amplicon sequencing.微生物组分析：全基因组鸟枪法测序与16S扩增子测序的优势

Biochem Biophys Res Commun. 2016 Jan 22;469(4):967-77. doi: 10.1016/j.bbrc.2015.12.083. Epub 2015 Dec 22.

引用本文的文献

Shotgun and Hi-C Sequencing Datasets for Binning Wheat Rhizosphere Microbiome.用于小麦根际微生物群落分箱的鸟枪法和Hi-C测序数据集

Sci Data. 2025 Mar 1;12(1):367. doi: 10.1038/s41597-025-04651-3.

Enhancing Clinical Utility: Utilization of International Standards and Guidelines for Metagenomic Sequencing in Infectious Disease Diagnosis.提高临床实用性：在传染病诊断中应用宏基因组测序的国际标准和指南。

Int J Mol Sci. 2024 Mar 15;25(6):3333. doi: 10.3390/ijms25063333.

Comparative analysis of HiSeq3000 and BGISEQ-500 sequencing platform over whole genome sequencing metagenomics data.HiSeq3000和BGISEQ-500测序平台在全基因组测序宏基因组学数据上的比较分析

Genomics Inform. 2023 Dec;21(4):e49. doi: 10.5808/gi.23072. Epub 2023 Dec 29.

Are the predicted known bacterial strains in a sample really present? A case study.样本中预测的已知细菌菌株真的存在吗？一个案例研究。

PLoS One. 2023 Oct 13;18(10):e0291964. doi: 10.1371/journal.pone.0291964. eCollection 2023.

Wochenende - modular and flexible alignment-based shotgun metagenome analysis.周末-基于模块化和灵活对准的 shotgun 宏基因组分析。

BMC Genomics. 2022 Nov 11;23(1):748. doi: 10.1186/s12864-022-08985-9.

Bacterial low-abundant taxa are key determinants of a healthy airway metagenome in the early years of human life.细菌低丰度分类群是人类生命早期健康气道宏基因组的关键决定因素。

Comput Struct Biotechnol J. 2021 Dec 15;20:175-186. doi: 10.1016/j.csbj.2021.12.008. eCollection 2022.

Evaluation of a high-throughput, cost-effective Illumina library preparation kit.高通量、经济高效的 Illumina 文库制备试剂盒的评估。

Sci Rep. 2021 Aug 5;11(1):15925. doi: 10.1038/s41598-021-94911-0.

Estimating the Optimum Coverage and Quality of Amplicon Sequencing With Taylor's Power Law Extensions.用泰勒幂律扩展估计扩增子测序的最佳覆盖度和质量

Front Bioeng Biotechnol. 2020 May 15;8:372. doi: 10.3389/fbioe.2020.00372. eCollection 2020.

Analytical Biases Associated with GC-Content in Molecular Evolution.分子进化中与鸟嘌呤-胞嘧啶含量相关的分析偏差。

Front Genet. 2017 Feb 15;8:16. doi: 10.3389/fgene.2017.00016. eCollection 2017.

本文引用的文献

Holes in the Hologenome: Why Host-Microbe Symbioses Are Not Holobionts.全基因组中的漏洞：为何宿主-微生物共生体并非全生物

mBio. 2016 Mar 31;7(2):e02099. doi: 10.1128/mBio.02099-15.

Diversity of Pseudomonas Genomes, Including Populus-Associated Isolates, as Revealed by Comparative Genome Analysis.通过比较基因组分析揭示的假单胞菌基因组多样性，包括与杨树相关的分离株。

Appl Environ Microbiol. 2015 Oct 30;82(1):375-83. doi: 10.1128/AEM.02612-15. Print 2016 Jan 1.

The extensive set of accessory Pseudomonas aeruginosa genomic components.铜绿假单胞菌基因组附属成分的广泛集合。

FEMS Microbiol Lett. 2014 Jul;356(2):235-41. doi: 10.1111/1574-6968.12445. Epub 2014 May 12.

Investigating and correcting plasma DNA sequencing coverage bias to enhance aneuploidy discovery.研究并校正血浆DNA测序覆盖偏差以增强非整倍体检测

PLoS One. 2014 Jan 29;9(1):e86993. doi: 10.1371/journal.pone.0086993. eCollection 2014.

Coverage bias and sensitivity of variant calling for four whole-genome sequencing technologies.四种全基因组测序技术的变异调用覆盖偏差和灵敏度。

PLoS One. 2013 Jun 11;8(6):e66621. doi: 10.1371/journal.pone.0066621. Print 2013.

Protein signature-based estimation of metagenomic abundances including all domains of life and viruses.基于蛋白质特征的宏基因组丰度估计，包括所有生命领域和病毒。

Bioinformatics. 2013 Apr 15;29(8):973-80. doi: 10.1093/bioinformatics/btt077. Epub 2013 Feb 15.

Genometa--a fast and accurate classifier for short metagenomic shotgun reads.Genometa——一种快速准确的短宏基因组 shotgun reads 分类器。

PLoS One. 2012;7(8):e41224. doi: 10.1371/journal.pone.0041224. Epub 2012 Aug 21.

The PhyloPythiaS web server for taxonomic assignment of metagenome sequences.PhyloPythiaS 网页服务器，用于对宏基因组序列进行分类学分配。

PLoS One. 2012;7(6):e38581. doi: 10.1371/journal.pone.0038581. Epub 2012 Jun 20.

Metabolic reconstruction for metagenomic data and its application to the human microbiome.宏基因组数据的代谢重建及其在人类微生物组中的应用。

PLoS Comput Biol. 2012;8(6):e1002358. doi: 10.1371/journal.pcbi.1002358. Epub 2012 Jun 13.

Structure, function and diversity of the healthy human microbiome.健康人体微生物组的结构、功能与多样性。

Nature. 2012 Jun 13;486(7402):207-14. doi: 10.1038/nature11234.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

全基因组鸟枪法样本中测序读段数据的过滤与标准化

Filtration and Normalization of Sequencing Read Data in Whole-Metagenome Shotgun Samples.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献