• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

测序错误:测量和抑制下一代测序数据中的测序错误。

SequencErr: measuring and suppressing sequencer errors in next-generation sequencing data.

机构信息

Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, USA.

Department of Computer Science, University of Memphis, Memphis, TN, USA.

出版信息

Genome Biol. 2021 Jan 25;22(1):37. doi: 10.1186/s13059-020-02254-2.

DOI:10.1186/s13059-020-02254-2
PMID:33487172
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7829059/
Abstract

BACKGROUND

There is currently no method to precisely measure the errors that occur in the sequencing instrument/sequencer, which is critical for next-generation sequencing applications aimed at discovering the genetic makeup of heterogeneous cellular populations.

RESULTS

We propose a novel computational method, SequencErr, to address this challenge by measuring the base correspondence between overlapping regions in forward and reverse reads. An analysis of 3777 public datasets from 75 research institutions in 18 countries revealed the sequencer error rate to be ~ 10 per million (pm) and 1.4% of sequencers and 2.7% of flow cells have error rates > 100 pm. At the flow cell level, error rates are elevated in the bottom surfaces and > 90% of HiSeq and NovaSeq flow cells have at least one outlier error-prone tile. By sequencing a common DNA library on different sequencers, we demonstrate that sequencers with high error rates have reduced overall sequencing accuracy, and removal of outlier error-prone tiles improves sequencing accuracy. We demonstrate that SequencErr can reveal novel insights relative to the popular quality control method FastQC and achieve a 10-fold lower error rate than popular error correction methods including Lighter and Musket.

CONCLUSIONS

Our study reveals novel insights into the nature of DNA sequencing errors incurred on DNA sequencers. Our method can be used to assess, calibrate, and monitor sequencer accuracy, and to computationally suppress sequencer errors in existing datasets.

摘要

背景

目前尚无精确测量测序仪器/测序仪错误的方法,这对于旨在发现异质细胞群体遗传构成的下一代测序应用至关重要。

结果

我们提出了一种新的计算方法 SequencErr,通过测量正向和反向读取的重叠区域之间的碱基对应关系来解决这一挑战。对来自 18 个国家的 75 个研究机构的 3777 个公共数据集的分析表明,测序仪错误率约为每百万 10 个(pm),1.4%的测序仪和 2.7%的流动池错误率>100 pm。在流动池层面,底部表面的错误率较高,超过 90%的 HiSeq 和 NovaSeq 流动池至少有一个易出错的异常点。通过在不同的测序仪上对常见的 DNA 文库进行测序,我们证明了具有高错误率的测序仪整体测序准确性降低,并且去除异常易出错的点可以提高测序准确性。我们证明了 SequencErr 可以揭示相对于流行的质量控制方法 FastQC 的新见解,并实现比流行的纠错方法包括 Lighter 和 Musket 低 10 倍的错误率。

结论

我们的研究揭示了 DNA 测序仪上 DNA 测序错误的性质的新见解。我们的方法可用于评估、校准和监测测序仪的准确性,并在现有数据集中计算抑制测序仪错误。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c04/7830816/22a03e47dec7/13059_2020_2254_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c04/7830816/aa44de48d6da/13059_2020_2254_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c04/7830816/b08228369407/13059_2020_2254_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c04/7830816/22a03e47dec7/13059_2020_2254_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c04/7830816/aa44de48d6da/13059_2020_2254_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c04/7830816/b08228369407/13059_2020_2254_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c04/7830816/22a03e47dec7/13059_2020_2254_Fig3_HTML.jpg

相似文献

1
SequencErr: measuring and suppressing sequencer errors in next-generation sequencing data.测序错误:测量和抑制下一代测序数据中的测序错误。
Genome Biol. 2021 Jan 25;22(1):37. doi: 10.1186/s13059-020-02254-2.
2
Discovering motifs that induce sequencing errors.发现诱导测序错误的模体。
BMC Bioinformatics. 2013;14 Suppl 5(Suppl 5):S1. doi: 10.1186/1471-2105-14-S5-S1. Epub 2013 Apr 10.
3
PAFFT: A new homology search algorithm for third-generation sequencers.PAFFT:一种用于第三代测序仪的新型同源性搜索算法。
Genomics. 2015 Nov;106(5):265-7. doi: 10.1016/j.ygeno.2015.09.005. Epub 2015 Sep 24.
4
A comparative study of k-spectrum-based error correction methods for next-generation sequencing data analysis.基于k谱的下一代测序数据分析纠错方法的比较研究。
Hum Genomics. 2016 Jul 25;10 Suppl 2(Suppl 2):20. doi: 10.1186/s40246-016-0068-0.
5
In search of perfect reads.寻找完美的读数。
BMC Bioinformatics. 2015;16 Suppl 17(Suppl 17):S7. doi: 10.1186/1471-2105-16-S17-S7. Epub 2015 Dec 7.
6
Characterization and remediation of sample index swaps by non-redundant dual indexing on massively parallel sequencing platforms.基于大规模平行测序平台的非冗余双索引对样本索引交换的特征描述和修复。
BMC Genomics. 2018 May 8;19(1):332. doi: 10.1186/s12864-018-4703-0.
7
An efficient quantitation method of next-generation sequencing libraries by using MiSeq sequencer.一种使用MiSeq测序仪对新一代测序文库进行高效定量的方法。
Anal Biochem. 2014 Dec 1;466:27-9. doi: 10.1016/j.ab.2014.08.015. Epub 2014 Aug 28.
8
Long fragments achieve lower base quality in Illumina paired-end sequencing.长片段在 Illumina 双端测序中得到的碱基质量较低。
Sci Rep. 2019 Feb 27;9(1):2856. doi: 10.1038/s41598-019-39076-7.
9
Iterative error correction of long sequencing reads maximizes accuracy and improves contig assembly.长测序读段的迭代纠错可最大化准确性并改善重叠群组装。
Brief Bioinform. 2017 Jan;18(1):1-8. doi: 10.1093/bib/bbw003. Epub 2016 Feb 10.
10
Aberration-corrected ultrafine analysis of miRNA reads at single-base resolution: a k-mer lattice approach.单碱基分辨率下 miRNA reads 的畸变校正超微分析:一种 k-mer 格点方法。
Nucleic Acids Res. 2021 Oct 11;49(18):e106. doi: 10.1093/nar/gkab610.

引用本文的文献

1
Transcriptome analysis reveals that regulation network of the genes related to unique double flowers in tropical viviparous water lily.转录组分析揭示了热带胎生睡莲独特重瓣花相关基因的调控网络。
Sci Rep. 2025 Aug 12;15(1):29561. doi: 10.1038/s41598-025-15221-3.
2
Integrated transcriptome and BSA-seq analysis identifies a novel QTL for Meloidogyne graminicola resistance in rice HuaHang31.整合转录组和BSA-seq分析鉴定出水稻华航31中一个新的抗禾谷根结线虫QTL。
Theor Appl Genet. 2025 Aug 11;138(9):208. doi: 10.1007/s00122-025-04999-5.
3
A chromosome-level genome assembly reveals the regulatory mechanisms of flavonoid and carotenoid biosynthesis pathways.

本文引用的文献

1
Benchmarking of computational error-correction methods for next-generation sequencing data.下一代测序数据计算纠错方法的基准测试。
Genome Biol. 2020 Mar 17;21(1):71. doi: 10.1186/s13059-020-01988-3.
2
High-intensity sequencing reveals the sources of plasma circulating cell-free DNA variants.高通量测序揭示了血浆游离循环 DNA 变异体的来源。
Nat Med. 2019 Dec;25(12):1928-1937. doi: 10.1038/s41591-019-0652-7. Epub 2019 Nov 25.
3
Technical advance in targeted NGS analysis enables identification of lung cancer risk-associated low frequency TP53, PIK3CA, and BRAF mutations in airway epithelial cells.
一个染色体水平的基因组组装揭示了类黄酮和类胡萝卜素生物合成途径的调控机制。
Acta Pharm Sin B. 2025 Apr;15(4):2253-2272. doi: 10.1016/j.apsb.2025.03.005. Epub 2025 Mar 7.
4
Unveiling Pharmacological Mechanisms of (Abresham), a Traditional Arabic Unani Medicine for Ischemic Heart Disease: An Integrative Molecular Simulation Study.揭示(阿卜勒沙姆)治疗缺血性心脏病的传统阿拉伯尤纳尼医学的药理机制:一项综合分子模拟研究
Pharmaceutics. 2025 Feb 24;17(3):295. doi: 10.3390/pharmaceutics17030295.
5
Transcriptomic and metabolomic study of the biosynthetic pathways of bioactive components in Amomum tsaoko fruits.草果果实中生物活性成分生物合成途径的转录组学和代谢组学研究
BMC Plant Biol. 2025 Feb 18;25(1):212. doi: 10.1186/s12870-025-06239-w.
6
Molecular Epidemiology and Genetic Characterization of Carbapenem-Resistant Isolates from the ICU of a Tertiary Hospital in East China.中国东部一家三级医院重症监护病房耐碳青霉烯类分离株的分子流行病学及基因特征分析
Infect Drug Resist. 2024 Dec 31;17:5925-5945. doi: 10.2147/IDR.S491858. eCollection 2024.
7
Leveraging Therapeutic Proteins and Peptides from Earthworms: Targeting SOCS2 E3 Ligase for Cardiovascular Therapy through Molecular Dynamics Simulations.利用蚯蚓中的治疗性蛋白质和肽:通过分子动力学模拟靶向 SOCS2 E3 连接酶用于心血管治疗。
Int J Mol Sci. 2024 Oct 8;25(19):10818. doi: 10.3390/ijms251910818.
8
High-Throughput Transcriptomic Analysis of Circadian Rhythm of Chlorophyll Metabolism under Different Photoperiods in Tea Plants.不同光周期下茶树叶绿素代谢节律的高通量转录组分析。
Int J Mol Sci. 2024 Aug 27;25(17):9270. doi: 10.3390/ijms25179270.
9
SJPedPanel: A Pan-Cancer Gene Panel for Childhood Malignancies to Enhance Cancer Monitoring and Early Detection.SJPedPanel:用于儿童恶性肿瘤的泛癌种基因panel,以增强癌症监测和早期检测。
Clin Cancer Res. 2024 Sep 13;30(18):4100-4114. doi: 10.1158/1078-0432.CCR-24-1063.
10
Transcriptomic Analysis Reveals Adaptive Evolution and Conservation Implications for the Endangered .转录组分析揭示了濒危. 的适应性进化和保护意义。
Genes (Basel). 2024 Jun 14;15(6):787. doi: 10.3390/genes15060787.
靶向 NGS 分析技术的进步使我们能够在气道上皮细胞中鉴定出肺癌风险相关的低频率 TP53、PIK3CA 和 BRAF 突变。
BMC Cancer. 2019 Nov 11;19(1):1081. doi: 10.1186/s12885-019-6313-x.
4
Therapy-induced mutations drive the genomic landscape of relapsed acute lymphoblastic leukemia.治疗诱导的突变驱动复发急性淋巴细胞白血病的基因组景观。
Blood. 2020 Jan 2;135(1):41-55. doi: 10.1182/blood.2019002220.
5
Overcoming genetic heterogeneity in industrial fermentations.克服工业发酵中的遗传异质性。
Nat Biotechnol. 2019 Aug;37(8):869-876. doi: 10.1038/s41587-019-0171-6. Epub 2019 Jul 8.
6
Analysis of error profiles in deep next-generation sequencing data.深度下一代测序数据中的错误分析。
Genome Biol. 2019 Mar 14;20(1):50. doi: 10.1186/s13059-019-1659-6.
7
The Clonal Evolution of Metastatic Osteosarcoma as Shaped by Cisplatin Treatment.顺铂治疗塑造转移性骨肉瘤的克隆进化。
Mol Cancer Res. 2019 Apr;17(4):895-906. doi: 10.1158/1541-7786.MCR-18-0620. Epub 2019 Jan 16.
8
Genetic Risk for Subsequent Neoplasms Among Long-Term Survivors of Childhood Cancer.儿童癌症幸存者的后续肿瘤遗传风险。
J Clin Oncol. 2018 Jul 10;36(20):2078-2087. doi: 10.1200/JCO.2018.77.8589. Epub 2018 May 30.
9
Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations.提高下一代测序检测稀有和亚克隆突变的准确性。
Nat Rev Genet. 2018 May;19(5):269-285. doi: 10.1038/nrg.2017.117. Epub 2018 Mar 26.
10
Detection and localization of surgically resectable cancers with a multi-analyte blood test.通过多分析物血液检测对外科可切除癌症进行检测和定位。
Science. 2018 Feb 23;359(6378):926-930. doi: 10.1126/science.aar3247. Epub 2018 Jan 18.