PEAR：一种快速而准确的 Illumina 双端读取合并器。

PEAR: a fast and accurate Illumina Paired-End reAd mergeR.

机构信息

The Exelixis Lab, Scientific Computing Group, Heidelberg Institute for Theoretical Studies, Schloss-Wolfsbrunnenweg 35, D-69118 Heidelberg, Graduate School for Computing in Medicine and Life Sciences, Institut für Neuro- und Bioinformatik, University of Lübeck, 23538 Lübeck and Karlsruhe Institute of Technology, Institute for Theoretical Informatics, Postfach 6980, 76128 Karlsruhe, Germany.

出版信息

Bioinformatics. 2014 Mar 1;30(5):614-20. doi: 10.1093/bioinformatics/btt593. Epub 2013 Oct 18.

DOI:10.1093/bioinformatics/btt593

PMID:24142950

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3933873/

Abstract

MOTIVATION

The Illumina paired-end sequencing technology can generate reads from both ends of target DNA fragments, which can subsequently be merged to increase the overall read length. There already exist tools for merging these paired-end reads when the target fragments are equally long. However, when fragment lengths vary and, in particular, when either the fragment size is shorter than a single-end read, or longer than twice the size of a single-end read, most state-of-the-art mergers fail to generate reliable results. Therefore, a robust tool is needed to merge paired-end reads that exhibit varying overlap lengths because of varying target fragment lengths.

RESULTS

We present the PEAR software for merging raw Illumina paired-end reads from target fragments of varying length. The program evaluates all possible paired-end read overlaps and does not require the target fragment size as input. It also implements a statistical test for minimizing false-positive results. Tests on simulated and empirical data show that PEAR consistently generates highly accurate merged paired-end reads. A highly optimized implementation allows for merging millions of paired-end reads within a few minutes on a standard desktop computer. On multi-core architectures, the parallel version of PEAR shows linear speedups compared with the sequential version of PEAR.

AVAILABILITY AND IMPLEMENTATION

PEAR is implemented in C and uses POSIX threads. It is freely available at http://www.exelixis-lab.org/web/software/pear.

摘要

动机

Illumina 配对末端测序技术可以从目标 DNA 片段的两端生成读取序列，随后可以将这些读取序列进行合并以增加整体读取长度。当目标片段长度相同时，已经存在用于合并这些配对末端读取的工具。然而，当片段长度不同时，特别是当片段长度短于单末端读取或长于单末端读取的两倍时，大多数最先进的合并工具无法生成可靠的结果。因此，需要一种强大的工具来合并由于目标片段长度不同而具有不同重叠长度的配对末端读取。

结果

我们提出了 PEAR 软件，用于合并来自不同长度目标片段的原始 Illumina 配对末端读取。该程序评估所有可能的配对末端读取重叠，并且不需要目标片段大小作为输入。它还实现了一种用于最小化假阳性结果的统计测试。对模拟和经验数据的测试表明，PEAR 始终能够生成高度准确的合并配对末端读取。高度优化的实现允许在标准台式计算机上在几分钟内合并数百万对配对末端读取。在多核架构上，PEAR 的并行版本与 PEAR 的顺序版本相比具有线性加速。

可用性和实现

PEAR 是用 C 语言编写的，使用 POSIX 线程。它可以在 http://www.exelixis-lab.org/web/software/pear 上免费获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54d8/3933873/ffba9b17d003/btt593f1p.jpg

相似文献

PEAR: a fast and accurate Illumina Paired-End reAd mergeR.

Bioinformatics. 2014 Mar 1;30(5):614-20. doi: 10.1093/bioinformatics/btt593. Epub 2013 Oct 18.

FLASH: fast length adjustment of short reads to improve genome assemblies.

Bioinformatics. 2011 Nov 1;27(21):2957-63. doi: 10.1093/bioinformatics/btr507. Epub 2011 Sep 7.

NGmerge: merging paired-end reads via novel empirically-derived models of sequencing errors.

BMC Bioinformatics. 2018 Dec 20;19(1):536. doi: 10.1186/s12859-018-2579-2.

Pre- and post-sequencing recommendations for functional annotation of human fecal metagenomes.

BMC Bioinformatics. 2020 Feb 24;21(1):74. doi: 10.1186/s12859-020-3416-y.

CASPER: context-aware scheme for paired-end reads from high-throughput amplicon sequencing.

BMC Bioinformatics. 2014;15 Suppl 9(Suppl 9):S10. doi: 10.1186/1471-2105-15-S9-S10. Epub 2014 Sep 10.

Long fragments achieve lower base quality in Illumina paired-end sequencing.

Sci Rep. 2019 Feb 27;9(1):2856. doi: 10.1038/s41598-019-39076-7.

MeFiT: merging and filtering tool for illumina paired-end reads for 16S rRNA amplicon sequencing.

BMC Bioinformatics. 2016 Dec 1;17(1):491. doi: 10.1186/s12859-016-1358-1.

Benefits of merging paired-end reads before pre-processing environmental metagenomics data.

Mar Genomics. 2022 Feb;61:100914. doi: 10.1016/j.margen.2021.100914. Epub 2021 Dec 2.

De novo finished 2.8 Mbp Staphylococcus aureus genome assembly from 100 bp short and long range paired-end reads.

Bioinformatics. 2014 Jan 1;30(1):40-9. doi: 10.1093/bioinformatics/btt590. Epub 2013 Oct 15.

Pseudo-Sanger sequencing: massively parallel production of long and near error-free reads using NGS technology.

BMC Genomics. 2013 Oct 17;14(1):711. doi: 10.1186/1471-2164-14-711.

引用本文的文献

Isolation of multiple plant growth-promoting fungi and their effect on rice growth improvement on non-grain converted land.

Front Plant Sci. 2025 Aug 13;16:1618073. doi: 10.3389/fpls.2025.1618073. eCollection 2025.

Quantitative metagenomics for marine prokaryotes and photosynthetic eukaryotes.

ISME Commun. 2025 Jul 30;5(1):ycaf131. doi: 10.1093/ismeco/ycaf131. eCollection 2025 Jan.

Dynamic Changes in Microorganisms and Metabolites During Silage Fermentation of Whole Winter Wheat.

Vet Sci. 2025 Jul 28;12(8):708. doi: 10.3390/vetsci12080708.

Dissolved Oxygen Decline in Northern Beibu Gulf Summer Bottom Waters: Reserve Management Insights from Microbiome Analysis.

Microorganisms. 2025 Aug 20;13(8):1945. doi: 10.3390/microorganisms13081945.

Environmental Factors, Not Biotic Competitive Interactions, Drive the Relative Abundance of Diatoms and Chlorophyta in the Coastal Areas of the Beibu Gulf: Evidence From 18S rDNA Metabarcoding and Partial Least Squares-Path Modeling Analysis.

Ecol Evol. 2025 Aug 22;15(8):e71936. doi: 10.1002/ece3.71936. eCollection 2025 Aug.

Genomic analysis of differentiation and demography of the formerly conspecific agile (Dipodomys agilis) and Dulzura (D. simulans) kangaroo rats.

Heredity (Edinb). 2025 Aug 25. doi: 10.1038/s41437-025-00789-3.

DNA bendability regulates transcription factor binding to nucleosomes.

Nat Struct Mol Biol. 2025 Aug 25. doi: 10.1038/s41594-025-01633-2.

De novo rates of a -resistant mutation in two human populations.

Proc Natl Acad Sci U S A. 2025 Sep 2;122(35):e2424538122. doi: 10.1073/pnas.2424538122. Epub 2025 Aug 25.

GIN-CRC-Pareto: A graph-based Pareto-optimal multi-task learning framework to identify miRNA-target interactions in colorectal cancer.

bioRxiv. 2025 Aug 12:2025.08.10.669528. doi: 10.1101/2025.08.10.669528.

Functional evidence for variant classification from mutational scanning.

bioRxiv. 2025 Aug 15:2025.08.11.669723. doi: 10.1101/2025.08.11.669723.

本文引用的文献

COPE: an accurate k-mer-based pair-end reads connection tool to facilitate genome assembly.

Bioinformatics. 2012 Nov 15;28(22):2870-4. doi: 10.1093/bioinformatics/bts563. Epub 2012 Oct 8.

Fast gapped-read alignment with Bowtie 2.

Nat Methods. 2012 Mar 4;9(4):357-9. doi: 10.1038/nmeth.1923.

PANDAseq: paired-end assembler for illumina sequences.

BMC Bioinformatics. 2012 Feb 14;13:31. doi: 10.1186/1471-2105-13-31.

ART: a next-generation sequencing read simulator.

Bioinformatics. 2012 Feb 15;28(4):593-4. doi: 10.1093/bioinformatics/btr708. Epub 2011 Dec 23.

FLASH: fast length adjustment of short reads to improve genome assemblies.

Bioinformatics. 2011 Nov 1;27(21):2957-63. doi: 10.1093/bioinformatics/btr507. Epub 2011 Sep 7.

Illumina-based analysis of microbial community diversity.

ISME J. 2012 Jan;6(1):183-94. doi: 10.1038/ismej.2011.74. Epub 2011 Jun 16.

Sequence-specific error profile of Illumina sequencers.

Nucleic Acids Res. 2011 Jul;39(13):e90. doi: 10.1093/nar/gkr344. Epub 2011 May 16.

Generation of multimillion-sequence 16S rRNA gene libraries from complex microbial communities by assembling paired-end illumina reads.

Appl Environ Microbiol. 2011 Jun;77(11):3846-52. doi: 10.1128/AEM.02772-10. Epub 2011 Apr 1.

De novo assembly and characterization of root transcriptome using Illumina paired-end sequencing and development of cSSR markers in sweet potato (Ipomoea batatas).

BMC Genomics. 2010 Dec 24;11:726. doi: 10.1186/1471-2164-11-726.

Microbiome profiling by illumina sequencing of combinatorial sequence-tagged PCR products.

PLoS One. 2010 Oct 26;5(10):e15406. doi: 10.1371/journal.pone.0015406.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

PEAR：一种快速而准确的 Illumina 双端读取合并器。

PEAR: a fast and accurate Illumina Paired-End reAd mergeR.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献