GC 偏倚影响基因组和宏基因组的重建，使 GC 含量低的生物代表性不足。

GC bias affects genomic and metagenomic reconstructions, underrepresenting GC-poor organisms.

机构信息

Department of Plant and Environmental Sciences, University of Copenhagen, Thorvaldsensvej 40, Frederiksberg C, 1871, Denmark.

Department of Environmental Science, Aarhus University, Frederiksborgvej 399, Roskilde, 4000, Denmark.

出版信息

Gigascience. 2020 Feb 1;9(2). doi: 10.1093/gigascience/giaa008.

DOI:10.1093/gigascience/giaa008

PMID:32052832

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7016772/

Abstract

BACKGROUND

Metagenomic sequencing is a well-established tool in the modern biosciences. While it promises unparalleled insights into the genetic content of the biological samples studied, conclusions drawn are at risk from biases inherent to the DNA sequencing methods, including inaccurate abundance estimates as a function of genomic guanine-cytosine (GC) contents.

RESULTS

We explored such GC biases across many commonly used platforms in experiments sequencing multiple genomes (with mean GC contents ranging from 28.9% to 62.4%) and metagenomes. GC bias profiles varied among different library preparation protocols and sequencing platforms. We found that our workflows using MiSeq and NextSeq were hindered by major GC biases, with problems becoming increasingly severe outside the 45-65% GC range, leading to a falsely low coverage in GC-rich and especially GC-poor sequences, where genomic windows with 30% GC content had >10-fold less coverage than windows close to 50% GC content. We also showed that GC content correlates tightly with coverage biases. The PacBio and HiSeq platforms also evidenced similar profiles of GC biases to each other, which were distinct from those seen in the MiSeq and NextSeq workflows. The Oxford Nanopore workflow was not afflicted by GC bias.

CONCLUSIONS

These findings indicate potential sources of difficulty, arising from GC biases, in genome sequencing that could be pre-emptively addressed with methodological optimizations provided that the GC biases inherent to the relevant workflow are understood. Furthermore, it is recommended that a more critical approach be taken in quantitative abundance estimates in metagenomic studies. In the future, metagenomic studies should take steps to account for the effects of GC bias before drawing conclusions, or they should use a demonstrably unbiased workflow.

摘要

背景

宏基因组测序是现代生物科学中一种成熟的工具。虽然它有望提供对所研究生物样本遗传内容的无与伦比的见解，但由于 DNA 测序方法固有的偏差，包括作为基因组鸟嘌呤-胞嘧啶（GC）含量函数的不准确丰度估计，得出的结论存在风险。

结果

我们在多个常用平台上进行了实验，对多个基因组（GC 含量从 28.9%到 62.4%不等）和宏基因组进行了测序，以探索这种 GC 偏差。GC 偏差谱在不同的文库制备方案和测序平台之间有所不同。我们发现，我们使用 MiSeq 和 NextSeq 的工作流程受到了主要 GC 偏差的阻碍，在 45-65%GC 范围之外，问题变得越来越严重，导致 GC 丰富和特别是 GC 贫乏的序列覆盖度过低，其中 GC 含量为 30%的基因组窗口的覆盖度比接近 50%GC 含量的窗口低 10 倍以上。我们还表明，GC 含量与覆盖度偏差密切相关。PacBio 和 HiSeq 平台彼此之间也存在相似的 GC 偏差分布，与 MiSeq 和 NextSeq 工作流程中的偏差明显不同。Oxford Nanopore 工作流程不受 GC 偏差的影响。

结论

这些发现表明，在基因组测序中，由于 GC 偏差可能会导致潜在的困难，如果了解相关工作流程中固有的 GC 偏差，则可以通过方法优化来预先解决这些问题。此外，建议在宏基因组研究中对定量丰度估计采取更具批判性的方法。在未来，宏基因组研究应该在得出结论之前采取措施来考虑 GC 偏差的影响，或者使用证明无偏差的工作流程。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f551/7016772/f81199dba992/giaa008fig1.jpg

相似文献

GC bias affects genomic and metagenomic reconstructions, underrepresenting GC-poor organisms.GC 偏倚影响基因组和宏基因组的重建，使 GC 含量低的生物代表性不足。

Gigascience. 2020 Feb 1;9(2). doi: 10.1093/gigascience/giaa008.

Comparison of the sequencing bias of currently available library preparation kits for Illumina sequencing of bacterial genomes and metagenomes.比较目前用于 Illumina 测序的细菌基因组和宏基因组文库制备试剂盒的测序偏倚。

DNA Res. 2019 Oct 1;26(5):391-398. doi: 10.1093/dnares/dsz017.

Illumina error profiles: resolving fine-scale variation in metagenomic sequencing data.Illumina错误概况：解析宏基因组测序数据中的精细尺度变异

BMC Bioinformatics. 2016 Mar 11;17:125. doi: 10.1186/s12859-016-0976-y.

Filtration and Normalization of Sequencing Read Data in Whole-Metagenome Shotgun Samples.全基因组鸟枪法样本中测序读段数据的过滤与标准化

PLoS One. 2016 Oct 19;11(10):e0165015. doi: 10.1371/journal.pone.0165015. eCollection 2016.

MinION™ nanopore sequencing of environmental metagenomes: a synthetic approach.环境宏基因组的MinION™纳米孔测序：一种合成方法。

Gigascience. 2017 Mar 1;6(3):1-10. doi: 10.1093/gigascience/gix007.

Assembly methods for nanopore-based metagenomic sequencing: a comparative study.基于纳米孔的宏基因组测序的组装方法：一项比较研究。

Sci Rep. 2020 Aug 12;10(1):13588. doi: 10.1038/s41598-020-70491-3.

Identifying Bacterial Airways Infection in Stable Severe Asthma Using Oxford Nanopore Sequencing Technologies.利用牛津纳米孔测序技术鉴定稳定期重度哮喘患者的气道细菌感染。

Microbiol Spectr. 2022 Apr 27;10(2):e0227921. doi: 10.1128/spectrum.02279-21. Epub 2022 Mar 24.

Ultra-deep, long-read nanopore sequencing of mock microbial community standards.超深度、长读长纳米孔测序模拟微生物群落标准品。

Gigascience. 2019 May 1;8(5). doi: 10.1093/gigascience/giz043.

Assessment of REPLI-g Multiple Displacement Whole Genome Amplification (WGA) Techniques for Metagenomic Applications.用于宏基因组学应用的REPLI-g多重置换全基因组扩增（WGA）技术评估

J Biomol Tech. 2017 Apr;28(1):46-55. doi: 10.7171/jbt.17-2801-008. Epub 2017 Mar 21.

Evaluating metagenomics and targeted approaches for diagnosis and surveillance of viruses.评估宏基因组学和靶向方法在病毒诊断和监测中的应用。

Genome Med. 2024 Sep 9;16(1):111. doi: 10.1186/s13073-024-01380-x.

引用本文的文献

Dissemination of OXA-23 carbapenemase-producing and is driven by transposon-carrying lineages in the UK.在英国，携带转座子的谱系推动了产OXA - 23碳青霉烯酶菌株的传播。

Microb Genom. 2025 Sep;11(9). doi: 10.1099/mgen.0.001502.

SyFi: generating and using sequence fingerprints to distinguish SynCom isolates.SyFi：生成并使用序列指纹来区分合成群落分离株。

Microb Genom. 2025 Sep;11(9). doi: 10.1099/mgen.0.001461.

Pore-C sequencing identifies episome-driven chromosome conformation perturbations differentiating pneumococcal epigenetic variants.Pore-C测序可识别由附加体驱动的染色体构象扰动，这些扰动可区分肺炎球菌的表观遗传变体。

PLoS Pathog. 2025 Aug 14;21(8):e1013392. doi: 10.1371/journal.ppat.1013392. eCollection 2025 Aug.

Variation of and associations with the depth and evenness of sequencing coverage in archived plastid genomes.存档质体基因组中测序覆盖深度和均匀度的变化及其关联

Res Sq. 2025 Jul 14:rs.3.rs-5784537. doi: 10.21203/rs.3.rs-5784537/v1.

Variation of and associations with the depth and evenness of sequencing coverage in archived plastid genomes.存档质体基因组中测序覆盖深度和均匀性的变化及其关联

Sci Rep. 2025 Jul 19;15(1):26294. doi: 10.1038/s41598-025-11568-9.

Avian interferon regulatory factor (IRF) family reunion: IRF3 and IRF9 found.鸟类干扰素调节因子（IRF）家族团聚：发现了IRF3和IRF9。

BMC Biol. 2025 Jul 1;23(1):180. doi: 10.1186/s12915-025-02261-4.

Impact of DNA Extraction and 16S rRNA Gene Amplification Strategy on Microbiota Profiling of Faecal Samples.DNA提取和16S rRNA基因扩增策略对粪便样本微生物群分析的影响

Int J Mol Sci. 2025 May 29;26(11):5226. doi: 10.3390/ijms26115226.

Biases from Oxford Nanopore library preparation kits and their effects on microbiome and genome analysis.牛津纳米孔文库制备试剂盒的偏差及其对微生物组和基因组分析的影响。

BMC Genomics. 2025 May 19;26(1):504. doi: 10.1186/s12864-025-11649-z.

Observation Bias in Metabarcoding.宏条形码分析中的观察偏差

Mol Ecol Resour. 2025 Oct;25(7):e14119. doi: 10.1111/1755-0998.14119. Epub 2025 May 15.

GCfix: a fast and accurate fragment length-specific method for correcting GC bias in cell-free DNA.GCfix：一种用于校正游离DNA中GC偏差的快速且准确的片段长度特异性方法。

Bioinformatics. 2025 Jun 2;41(6). doi: 10.1093/bioinformatics/btaf293.

本文引用的文献

Protective role of the vulture facial skin and gut microbiomes aid adaptation to scavenging.秃鹫面部皮肤和肠道微生物群的保护作用有助于适应食腐习性。

Acta Vet Scand. 2018 Oct 11;60(1):61. doi: 10.1186/s13028-018-0415-3.

Minimap2: pairwise alignment for nucleotide sequences.Minimap2：核苷酸序列的两两比对。

Bioinformatics. 2018 Sep 15;34(18):3094-3100. doi: 10.1093/bioinformatics/bty191.

Evolution of Sphingomonad Gene Clusters Related to Pesticide Catabolism Revealed by Genome Sequence and Mobilomics of Sphingobium herbicidovorans MH.《通过基因组序列和运动组学研究，揭示了与农药代谢相关的鞘氨醇单胞菌基因簇的进化》解析：由于原文是一篇学术论文的标题，因此在翻译时，尽量保留原文的学术性和专业性。同时，为了使译文更流畅，还对部分语序进行了调整。

Genome Biol Evol. 2017 Sep 1;9(9):2477-2490. doi: 10.1093/gbe/evx185.

Degradation of mecoprop in polluted landfill leachate and waste water in a moving bed biofilm reactor.在移动床生物膜反应器中，污染垃圾渗滤液和废水中的麦草畏的降解。

Water Res. 2017 Sep 15;121:213-220. doi: 10.1016/j.watres.2017.05.031. Epub 2017 May 17.

Untangling Genomes of Novel and Species from Monterey Bay Kelp Forest Metagenomes by Refined Binning.通过精细分箱解析蒙特利湾海带森林宏基因组中的新物种基因组

Front Microbiol. 2017 Mar 29;8:472. doi: 10.3389/fmicb.2017.00472. eCollection 2017.

Genomic composition and dynamics among Methanomicrobiales predict adaptation to contrasting environments.甲烷微菌目之间的基因组组成和动态变化预示着对不同环境的适应性。

ISME J. 2017 Jan;11(1):87-99. doi: 10.1038/ismej.2016.104. Epub 2016 Aug 23.

Three decades of nanopore sequencing.纳米孔测序的三十年。

Nat Biotechnol. 2016 May 6;34(5):518-24. doi: 10.1038/nbt.3423.

MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices.MEGAHIT v1.0：一种由先进方法和社区实践驱动的快速且可扩展的宏基因组组装工具。

Methods. 2016 Jun 1;102:3-11. doi: 10.1016/j.ymeth.2016.02.020. Epub 2016 Mar 21.

Library preparation methodology can influence genomic and functional predictions in human microbiome research.文库制备方法会影响人类微生物组研究中的基因组和功能预测。

Proc Natl Acad Sci U S A. 2015 Nov 10;112(45):14024-9. doi: 10.1073/pnas.1519288112. Epub 2015 Oct 28.

Impact of library preparation protocols and template quantity on the metagenomic reconstruction of a mock microbial community.文库制备方案和模板量对模拟微生物群落宏基因组重建的影响

BMC Genomics. 2015 Oct 24;16:856. doi: 10.1186/s12864-015-2063-6.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

GC 偏倚影响基因组和宏基因组的重建，使 GC 含量低的生物代表性不足。

GC bias affects genomic and metagenomic reconstructions, underrepresenting GC-poor organisms.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献