Suppr超能文献

SARS-CoV-2 基因组测序评估:质量标准和低频变异。

Assessment of SARS-CoV-2 Genome Sequencing: Quality Criteria and Low-Frequency Variants.

机构信息

Institute of Microbiology, Laboratory of Genomics and Metagenomics, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland.

出版信息

J Clin Microbiol. 2021 Sep 20;59(10):e0094421. doi: 10.1128/JCM.00944-21. Epub 2021 Jul 28.

Abstract

Although many laboratories worldwide have developed their sequencing capacities in response to the need for SARS-CoV-2 genome-based surveillance of variants, only a few reported some quality criteria to ensure sequence quality before lineage assignment and submission to public databases. Hence, we aimed here to provide simple quality control criteria for SARS-CoV-2 sequencing to prevent erroneous interpretation of low-quality or contaminated data. We retrospectively investigated 647 SARS-CoV-2 genomes obtained over 10 tiled amplicons sequencing runs. We extracted 26 potentially relevant metrics covering the entire workflow from sample selection to bioinformatics analysis. Based on data distribution, critical values were established for 11 selected metrics to prompt further quality investigations for problematic samples, in particular those with a low viral RNA quantity. Low-frequency variants (<70% of supporting reads) can result from PCR amplification errors, sample cross contaminations, or presence of distinct SARS-CoV2 genomes in the sample sequenced. The number and the prevalence of low-frequency variants can be used as a robust quality criterion to identify possible sequencing errors or contaminations. Overall, we propose 11 metrics with fixed cutoff values as a simple tool to evaluate the quality of SARS-CoV-2 genomes, among which are cycle thresholds, mean depth, proportion of genome covered at least 10×, and the number of low-frequency variants combined with mutation prevalence data.

摘要

尽管全球许多实验室都为了基于 SARS-CoV-2 基因组监测变异体的需求而发展了自己的测序能力,但只有少数实验室报告了一些质量标准,以确保在进行谱系分配和提交公共数据库之前保证序列质量。因此,我们旨在为 SARS-CoV-2 测序提供简单的质量控制标准,以防止对低质量或污染数据的错误解释。我们回顾性地调查了在 10 个拼接扩增子测序运行中获得的 647 个 SARS-CoV-2 基因组。我们从样本选择到生物信息学分析,提取了 26 个可能相关的指标,涵盖了整个工作流程。基于数据分布,为 11 个选定的指标确定了临界值,以提示对有问题的样本进行进一步的质量调查,特别是那些病毒 RNA 数量低的样本。低频变异体(<70%的支持读数)可能是由于 PCR 扩增错误、样本交叉污染或样本中存在不同的 SARS-CoV2 基因组引起的。低频变异体的数量和流行率可用作识别可能的测序错误或污染的可靠质量标准。总的来说,我们提出了 11 个具有固定截止值的指标作为评估 SARS-CoV-2 基因组质量的简单工具,其中包括循环阈值、平均深度、至少覆盖 10×的基因组比例以及低频变异体的数量与突变流行率数据相结合。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b0c/8451431/d2032792f645/jcm.00944-21-f001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验