PacRAT：一种利用多重序列比对提高 PacBio 长读段中条码变异映射的程序。

PacRAT: a program to improve barcode-variant mapping from PacBio long reads using multiple sequence alignment.

机构信息

Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA.

Molecular and Cellular Biology Program, University of Washington, Seattle, WA 98195, USA.

出版信息

Bioinformatics. 2022 May 13;38(10):2927-2929. doi: 10.1093/bioinformatics/btac165.

DOI:10.1093/bioinformatics/btac165

PMID:35561209

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9306489/

Abstract

SUMMARY

Use of PacBio sequencing for characterizing barcoded libraries of genetic variants is on the rise. However, current approaches in resolving PacBio sequencing artifacts can result in a high number of incorrectly identified or unusable reads. Here, we developed a PacBio Read Alignment Tool (PacRAT) that improves the accuracy of barcode-variant mapping through several steps of read alignment and consensus calling. To quantify the performance of our approach, we simulated PacBio reads from eight variant libraries of various lengths and showed that PacRAT improves the accuracy in pairing barcodes and variants across these libraries. Analysis of real (non-simulated) libraries also showed an increase in the number of reads that can be used for downstream analyses when using PacRAT.

AVAILABILITY AND IMPLEMENTATION

PacRAT is written in Python and is freely available (https://github.com/dunhamlab/PacRAT).

SUPPLEMENTARY INFORMATION

Supplemental data are available at Bioinformatics online.

摘要

使用 PacBio 测序技术对带有条形码的遗传变异文库进行测序的方法越来越普及。然而，当前解决 PacBio 测序伪影的方法可能会导致大量的错误识别或无法使用的读取结果。在这里，我们开发了一种 PacBio 读取对齐工具（PacRAT），它通过几个读取对齐和共识调用步骤来提高条形码-变异映射的准确性。为了量化我们方法的性能，我们模拟了来自 8 个不同长度的变异文库的 PacBio 读取结果，结果表明 PacRAT 提高了在这些文库中配对条形码和变异的准确性。对真实（非模拟）文库的分析也表明，当使用 PacRAT 时，可用于下游分析的读取数量增加。

可用性和实施情况

PacRAT 是用 Python 编写的，并且可以免费获得（https://github.com/dunhamlab/PacRAT）。

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

PacRAT: a program to improve barcode-variant mapping from PacBio long reads using multiple sequence alignment.PacRAT：一种利用多重序列比对提高 PacBio 长读段中条码变异映射的程序。

Bioinformatics. 2022 May 13;38(10):2927-2929. doi: 10.1093/bioinformatics/btac165.

Chaining for accurate alignment of erroneous long reads to acyclic variation graphs.基于无环变异图的错误长读精确比对链。

Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad460.

Pacybara: accurate long-read sequencing for barcoded mutagenized allelic libraries.Pacybara：用于带条码诱变等位基因文库的准确长读测序。

Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae182.

Sensitive alignment using paralogous sequence variants improves long-read mapping and variant calling in segmental duplications.利用直系同源序列变异进行敏感比对可提高大片段重复区域的长读长序列比对和变异calling 效率。

Nucleic Acids Res. 2020 Nov 4;48(19):e114. doi: 10.1093/nar/gkaa829.

Evaluation of tools for long read RNA-seq splice-aware alignment.长读 RNA-seq 剪接感知比对工具评估。

Bioinformatics. 2018 Mar 1;34(5):748-754. doi: 10.1093/bioinformatics/btx668.

Alignment-free clustering of UMI tagged DNA molecules.无比对聚类分析 UMI 标签化 DNA 分子。

Bioinformatics. 2019 Jun 1;35(11):1829-1836. doi: 10.1093/bioinformatics/bty888.

Somatic variant analysis of linked-reads sequencing data with Lancet.基于 Lancet 软件对连接读取测序数据进行体细胞变异分析。

Bioinformatics. 2021 Jul 27;37(13):1918-1919. doi: 10.1093/bioinformatics/btaa888.

LRCstats, a tool for evaluating long reads correction methods.LRCstats，一种用于评估长读纠错方法的工具。

Bioinformatics. 2017 Nov 15;33(22):3652-3654. doi: 10.1093/bioinformatics/btx489.

Benchmarking the empirical accuracy of short-read sequencing across the M. tuberculosis genome.评估短读测序技术在结核分枝杆菌全基因组范围内的实际准确性。

Bioinformatics. 2022 Mar 28;38(7):1781-1787. doi: 10.1093/bioinformatics/btac023.

Fast and SNP-aware short read alignment with SALT.基于 SALT 的快速 SNP 感知短读序列比对。

BMC Bioinformatics. 2021 Aug 25;22(Suppl 9):172. doi: 10.1186/s12859-021-04088-6.

引用本文的文献

Variant scoring tools for deep mutational scanning.用于深度突变扫描的变异评分工具。

Mol Syst Biol. 2025 Aug 8. doi: 10.1038/s44320-025-00137-x.

Multiplex and multimodal mapping of variant effects in secreted proteins via MultiSTEP.通过MultiSTEP对分泌蛋白中的变异效应进行多重和多模态映射。

Nat Struct Mol Biol. 2025 Jun 13. doi: 10.1038/s41594-025-01582-w.

Multiplex, multimodal mapping of variant effects in secreted proteins.分泌蛋白中变异效应的多重、多模态映射。

bioRxiv. 2025 Jan 29:2024.04.01.587474. doi: 10.1101/2024.04.01.587474.

Deep mutational scanning of CYP2C19 in human cells reveals a substrate specificity-abundance tradeoff.在人类细胞中对 CYP2C19 进行深度突变扫描揭示了底物特异性-丰度权衡。

Genetics. 2024 Nov 6;228(3). doi: 10.1093/genetics/iyae156.

Pacybara: accurate long-read sequencing for barcoded mutagenized allelic libraries.Pacybara：用于带条码诱变等位基因文库的准确长读测序。

Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae182.

Pacybara: Accurate long-read sequencing for barcoded mutagenized allelic libraries.Pacybara：用于条形码诱变等位基因文库的精确长读长测序。

bioRxiv. 2023 Dec 7:2023.02.22.529427. doi: 10.1101/2023.02.22.529427.

A universal sequencing read interpreter.通用测序读码器。

Sci Adv. 2023 Jan 4;9(1):eadd2793. doi: 10.1126/sciadv.add2793.

本文引用的文献

Massively parallel characterization of CYP2C9 variant enzyme activity and abundance.大规模平行表征 CYP2C9 变异酶的活性和丰度。

Am J Hum Genet. 2021 Sep 2;108(9):1735-1751. doi: 10.1016/j.ajhg.2021.07.001. Epub 2021 Jul 26.

Multiplexing mutation rate assessment: determining pathogenicity of Msh2 variants in Saccharomyces cerevisiae.多重突变率评估：在酿酒酵母中确定 Msh2 变体的致病性。

Genetics. 2021 Jun 24;218(2). doi: 10.1093/genetics/iyab058.

Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome.精确的圆形共识长读测序提高了人类基因组变异检测和组装的准确性。

Nat Biotechnol. 2019 Oct;37(10):1155-1162. doi: 10.1038/s41587-019-0217-9. Epub 2019 Aug 12.

Massively Parallel Assays and Quantitative Sequence-Function Relationships.大规模平行分析与定量序列功能关系。

Annu Rev Genomics Hum Genet. 2019 Aug 31;20:99-127. doi: 10.1146/annurev-genom-083118-014845. Epub 2019 May 15.

Multiplex assessment of protein variant abundance by massively parallel sequencing.通过大规模平行测序进行蛋白质变异体丰度的多重评估。

Nat Genet. 2018 Jun;50(6):874-882. doi: 10.1038/s41588-018-0122-z. Epub 2018 May 21.

SimLoRD: Simulation of Long Read Data.SimLoRD：长读长数据模拟

Bioinformatics. 2016 Sep 1;32(17):2704-6. doi: 10.1093/bioinformatics/btw286. Epub 2016 May 10.

Parallel, tag-directed assembly of locally derived short sequence reads.并行、标签导向的局部衍生短序列读取组装。

Nat Methods. 2010 Feb;7(2):119-22. doi: 10.1038/nmeth.1416. Epub 2010 Jan 17.

MUSCLE: multiple sequence alignment with high accuracy and high throughput.MUSCLE：具有高精度和高吞吐量的多序列比对。

Nucleic Acids Res. 2004 Mar 19;32(5):1792-7. doi: 10.1093/nar/gkh340. Print 2004.

EMBOSS: the European Molecular Biology Open Software Suite.EMBOSS：欧洲分子生物学开放软件套件。

Trends Genet. 2000 Jun;16(6):276-7. doi: 10.1016/s0168-9525(00)02024-2.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验