• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

合成 Spike-in 标准可改善 DNA 和 RNA 测序中特定运行的系统误差分析。

Synthetic spike-in standards improve run-specific systematic error analysis for DNA and RNA sequencing.

机构信息

Biochemical Science Division, National Institute of Standards and Technology, Gaithersburg, Maryland, United States of America.

出版信息

PLoS One. 2012;7(7):e41356. doi: 10.1371/journal.pone.0041356. Epub 2012 Jul 31.

DOI:10.1371/journal.pone.0041356
PMID:22859977
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3409179/
Abstract

While the importance of random sequencing errors decreases at higher DNA or RNA sequencing depths, systematic sequencing errors (SSEs) dominate at high sequencing depths and can be difficult to distinguish from biological variants. These SSEs can cause base quality scores to underestimate the probability of error at certain genomic positions, resulting in false positive variant calls, particularly in mixtures such as samples with RNA editing, tumors, circulating tumor cells, bacteria, mitochondrial heteroplasmy, or pooled DNA. Most algorithms proposed for correction of SSEs require a data set used to calculate association of SSEs with various features in the reads and sequence context. This data set is typically either from a part of the data set being "recalibrated" (Genome Analysis ToolKit, or GATK) or from a separate data set with special characteristics (SysCall). Here, we combine the advantages of these approaches by adding synthetic RNA spike-in standards to human RNA, and use GATK to recalibrate base quality scores with reads mapped to the spike-in standards. Compared to conventional GATK recalibration that uses reads mapped to the genome, spike-ins improve the accuracy of Illumina base quality scores by a mean of 5 Phred-scaled quality score units, and by as much as 13 units at CpG sites. In addition, since the spike-in data used for recalibration are independent of the genome being sequenced, our method allows run-specific recalibration even for the many species without a comprehensive and accurate SNP database. We also use GATK with the spike-in standards to demonstrate that the Illumina RNA sequencing runs overestimate quality scores for AC, CC, GC, GG, and TC dinucleotides, while SOLiD has less dinucleotide SSEs but more SSEs for certain cycles. We conclude that using these DNA and RNA spike-in standards with GATK improves base quality score recalibration.

摘要

虽然随机测序错误在较高的 DNA 或 RNA 测序深度下变得不那么重要,但系统测序错误(SSE)在高测序深度下占主导地位,并且很难与生物变异区分开来。这些 SSE 可能导致碱基质量得分低估某些基因组位置的错误概率,从而导致假阳性变异调用,特别是在 RNA 编辑、肿瘤、循环肿瘤细胞、细菌、线粒体异质性或混合 DNA 等混合物中。大多数用于纠正 SSE 的算法都需要一个数据集,用于计算 SSE 与读段和序列上下文各种特征的关联。该数据集通常来自要“重新校准”的数据集的一部分(基因组分析工具包或 GATK)或具有特殊特征的单独数据集(SysCall)。在这里,我们通过向人类 RNA 添加合成 RNA Spike-in 标准来结合这些方法的优点,并使用 GATK 对映射到 Spike-in 标准的读段进行碱基质量得分重新校准。与传统的使用映射到基因组的读段进行 GATK 重新校准相比,Spike-in 将 Illumina 碱基质量得分的准确性平均提高了 5 个 Phred 标度质量得分单位,在 CpG 位点甚至提高了 13 个单位。此外,由于用于重新校准的 Spike-in 数据与正在测序的基因组独立,因此即使对于没有全面准确的 SNP 数据库的许多物种,我们的方法也允许进行特定于运行的重新校准。我们还使用带有 Spike-in 标准的 GATK 来证明 Illumina RNA 测序运行高估了 AC、CC、GC、GG 和 TC 二核苷酸的质量得分,而 SOLiD 的二核苷酸 SSE 较少,但某些循环的 SSE 较多。我们得出的结论是,使用这些 DNA 和 RNA Spike-in 标准与 GATK 可以改善碱基质量得分重新校准。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e11/3409179/de93fb87f603/pone.0041356.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e11/3409179/4c55ffc1103b/pone.0041356.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e11/3409179/28dbef46974d/pone.0041356.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e11/3409179/d3a386546428/pone.0041356.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e11/3409179/bc96523d6b23/pone.0041356.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e11/3409179/de93fb87f603/pone.0041356.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e11/3409179/4c55ffc1103b/pone.0041356.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e11/3409179/28dbef46974d/pone.0041356.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e11/3409179/d3a386546428/pone.0041356.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e11/3409179/bc96523d6b23/pone.0041356.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e11/3409179/de93fb87f603/pone.0041356.g005.jpg

相似文献

1
Synthetic spike-in standards improve run-specific systematic error analysis for DNA and RNA sequencing.合成 Spike-in 标准可改善 DNA 和 RNA 测序中特定运行的系统误差分析。
PLoS One. 2012;7(7):e41356. doi: 10.1371/journal.pone.0041356. Epub 2012 Jul 31.
2
Improvement in detection of minor alleles in next generation sequencing by base quality recalibration.通过碱基质量重新校准提高下一代测序中稀有等位基因的检测能力。
BMC Genomics. 2016 Feb 27;17:139. doi: 10.1186/s12864-016-2463-2.
3
PhredEM: a phred-score-informed genotype-calling approach for next-generation sequencing studies.PhredEM:一种用于下一代测序研究的基于Phred分数的基因型分型方法。
Genet Epidemiol. 2017 Jul;41(5):375-387. doi: 10.1002/gepi.22048. Epub 2017 May 31.
4
Identification and correction of systematic error in high-throughput sequence data.高通量测序数据中系统误差的识别与校正。
BMC Bioinformatics. 2011 Nov 21;12:451. doi: 10.1186/1471-2105-12-451.
5
Recalibration of mapping quality scores in Illumina short-read alignments improves SNP detection results in low-coverage sequencing data.重新校准Illumina短读长比对中的映射质量分数可改善低覆盖度测序数据中的单核苷酸多态性(SNP)检测结果。
PeerJ. 2020 Dec 7;8:e10501. doi: 10.7717/peerj.10501. eCollection 2020.
6
Enhancing the detection of barcoded reads in high throughput DNA sequencing data by controlling the false discovery rate.通过控制假发现率来提高高通量 DNA 测序数据中条码读取的检测能力。
BMC Bioinformatics. 2014 Aug 7;15(1):264. doi: 10.1186/1471-2105-15-264.
7
RIG: Recalibration and interrelation of genomic sequence data with the GATK.RIG:利用基因组分析工具包(GATK)对基因组序列数据进行重新校准和相互关联
G3 (Bethesda). 2015 Feb 13;5(4):655-65. doi: 10.1534/g3.115.017012.
8
Calling known variants and identifying new variants while rapidly aligning sequence data.在快速对齐序列数据的同时,调用已知变异体并识别新变异体。
J Dairy Sci. 2019 Apr;102(4):3216-3229. doi: 10.3168/jds.2018-15172. Epub 2019 Feb 14.
9
In-depth analysis of interrelation between quality scores and real errors in Illumina reads.对Illumina测序读段中质量分数与实际错误之间的相互关系进行深入分析。
Annu Int Conf IEEE Eng Med Biol Soc. 2013;2013:635-8. doi: 10.1109/EMBC.2013.6609580.
10
Efficient frequency-based de novo short-read clustering for error trimming in next-generation sequencing.用于下一代测序中错误校正的基于频率的高效从头短读聚类
Genome Res. 2009 Jul;19(7):1309-15. doi: 10.1101/gr.089151.108. Epub 2009 May 13.

引用本文的文献

1
Absolute Quantification of Nucleotide Variants in Cell-Free DNA via Quantitative NGS: Clinical Application in Non-Small Cell Lung Cancer Patients.通过定量下一代测序对游离DNA中的核苷酸变异进行绝对定量:在非小细胞肺癌患者中的临床应用
Cancers (Basel). 2025 Feb 25;17(5):783. doi: 10.3390/cancers17050783.
2
NetREX-CF integrates incomplete transcription factor data with gene expression to reconstruct gene regulatory networks.NetREX-CF 整合了不完整的转录因子数据与基因表达信息,以重建基因调控网络。
Commun Biol. 2022 Nov 23;5(1):1282. doi: 10.1038/s42003-022-04226-7.
3
Library adaptors with integrated reference controls improve the accuracy and reliability of nanopore sequencing.

本文引用的文献

1
Identification and correction of systematic error in high-throughput sequence data.高通量测序数据中系统误差的识别与校正。
BMC Bioinformatics. 2011 Nov 21;12:451. doi: 10.1186/1471-2105-12-451.
2
Very few RNA and DNA sequence differences in the human transcriptome.人类转录组中只有极少数的 RNA 和 DNA 序列差异。
PLoS One. 2011;6(10):e25842. doi: 10.1371/journal.pone.0025842. Epub 2011 Oct 12.
3
Synthetic spike-in standards for RNA-seq experiments.用于 RNA-seq 实验的合成 Spike-in 标准品。
带有集成参考控制的文库接头可提高纳米孔测序的准确性和可靠性。
Nat Commun. 2022 Oct 28;13(1):6437. doi: 10.1038/s41467-022-34028-8.
4
Genome-Wide Analysis of Off-Target CRISPR/Cas9 Activity in Single-Cell-Derived Human Hematopoietic Stem and Progenitor Cell Clones.单细胞衍生的人类造血干/祖细胞克隆中靶向 CRISPR/Cas9 活性的全基因组分析。
Genes (Basel). 2020 Dec 13;11(12):1501. doi: 10.3390/genes11121501.
5
A single-cell transcriptomic atlas characterizes ageing tissues in the mouse.单细胞转录组图谱描绘了小鼠衰老组织的特征。
Nature. 2020 Jul;583(7817):590-595. doi: 10.1038/s41586-020-2496-1. Epub 2020 Jul 15.
6
Use of Spiked Normalizers to More Precisely Quantify Tumor Markers and Viral Genomes by Massive Parallel Sequencing of Plasma DNA.使用加扰标准化物通过血浆 DNA 的大规模平行测序更精确地定量肿瘤标志物和病毒基因组。
J Mol Diagn. 2020 Apr;22(4):437-446. doi: 10.1016/j.jmoldx.2020.01.012. Epub 2020 Feb 7.
7
Whole-genome resequencing analysis of 20 Micro-pigs.20 头微型猪的全基因组重测序分析。
Genes Genomics. 2020 Mar;42(3):263-272. doi: 10.1007/s13258-019-00891-x. Epub 2019 Dec 12.
8
Heterochromatin Stabilization Requires the Zinc-Finger Protein Small Ovary.异染色质稳定需要锌指蛋白 Small Ovary。
Genetics. 2019 Nov;213(3):877-895. doi: 10.1534/genetics.119.302590. Epub 2019 Sep 26.
9
Re-annotation of eight genomes.八个基因组的重新注释。
Life Sci Alliance. 2018 Dec 24;1(6):e201800156. doi: 10.26508/lsa.201800156. eCollection 2018 Dec.
10
RNA markers enable phenotypic test of antibiotic susceptibility in Neisseria gonorrhoeae after 10 minutes of ciprofloxacin exposure.RNA 标志物可使淋病奈瑟菌在接触环丙沙星 10 分钟后进行表型抗生素药敏试验。
Sci Rep. 2018 Aug 2;8(1):11606. doi: 10.1038/s41598-018-29707-w.
Genome Res. 2011 Sep;21(9):1543-51. doi: 10.1101/gr.121095.111. Epub 2011 Aug 4.
4
A mosaic activating mutation in AKT1 associated with the Proteus syndrome.AKT1 中的镶嵌激活突变与Proteus 综合征相关。
N Engl J Med. 2011 Aug 18;365(7):611-9. doi: 10.1056/NEJMoa1104017. Epub 2011 Jul 27.
5
RNA-ligase-dependent biases in miRNA representation in deep-sequenced small RNA cDNA libraries.深度测序小 RNA cDNA 文库中 miRNA 代表的 RNA 连接酶依赖性偏倚。
RNA. 2011 Sep;17(9):1697-712. doi: 10.1261/rna.2799511. Epub 2011 Jul 20.
6
Noninvasive prenatal diagnosis of fetal trisomy 18 and trisomy 13 by maternal plasma DNA sequencing.母体外周血游离 DNA 测序在胎儿三体 18 及三体 13 非侵入性产前诊断中的应用
PLoS One. 2011;6(7):e21791. doi: 10.1371/journal.pone.0021791. Epub 2011 Jul 6.
7
Widespread RNA and DNA sequence differences in the human transcriptome.人类转录组中广泛存在的 RNA 和 DNA 序列差异。
Science. 2011 Jul 1;333(6038):53-8. doi: 10.1126/science.1207018. Epub 2011 May 19.
8
Detection and quantification of rare mutations with massively parallel sequencing.大规模平行测序检测和定量稀有突变。
Proc Natl Acad Sci U S A. 2011 Jun 7;108(23):9530-5. doi: 10.1073/pnas.1105422108. Epub 2011 May 17.
9
An adaptable method using human mixed tissue ratiometric controls for benchmarking performance on gene expression microarrays in clinical laboratories.一种使用人混合组织比率控制进行基准测试的适应性方法,用于评估临床实验室中基因表达微阵列的性能。
BMC Biotechnol. 2011 Apr 12;11:38. doi: 10.1186/1472-6750-11-38.
10
A framework for variation discovery and genotyping using next-generation DNA sequencing data.利用下一代 DNA 测序数据进行变异发现和基因分型的框架。
Nat Genet. 2011 May;43(5):491-8. doi: 10.1038/ng.806. Epub 2011 Apr 10.