• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Castanet:一种用于快速分析靶向多病原体基因组数据的管道。

Castanet: a pipeline for rapid analysis of targeted multi-pathogen genomic data.

机构信息

Nuffield Department of Medicine, Peter Medawar Building for Pathogen Research, University of Oxford, Oxfordshire OX1 3SY, United Kingdom.

Radcliffe Department of Medicine, University of Oxford, West Wing John Radcliffe Hospital, Oxfordshire OX3 9DU, United Kingdom.

出版信息

Bioinformatics. 2024 Oct 1;40(10). doi: 10.1093/bioinformatics/btae591.

DOI:10.1093/bioinformatics/btae591
PMID:39360992
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11494375/
Abstract

MOTIVATION

Target enrichment strategies generate genomic data from multiple pathogens in a single process, greatly improving sensitivity over metagenomic sequencing and enabling cost-effective, high-throughput surveillance and clinical applications. However, uptake by research and clinical laboratories is constrained by an absence of computational tools that are specifically designed for the analysis of multi-pathogen enrichment sequence data. Here we present an analysis pipeline, Castanet, for use with multi-pathogen enrichment sequencing data. Castanet is designed to work with short-read data produced by existing targeted enrichment strategies, but can be readily deployed on any BAM file generated by another methodology. Also included are an optional graphical interface and installer script.

RESULTS

In addition to genome reconstruction, Castanet reports method-specific metrics that enable quantification of capture efficiency, estimation of pathogen load, differentiation of low-level positives from contamination, and assessment of sequencing quality. Castanet can be used as a traditional end-to-end pipeline for consensus generation, but its strength lies in the ability to process a flexible, pre-defined set of pathogens of interest directly from multi-pathogen enrichment experiments. In our tests, Castanet consensus sequences were accurate reconstructions of reference sequences, including in instances where multiple strains of the same pathogen were present. Castanet performs effectively on standard computers and can process the entire output of a 96-sample enrichment sequencing run (50M reads) using a single batch process command, in $<$2 h.

AVAILABILITY AND IMPLEMENTATION

Source code freely available under GPL-3 license at https://github.com/MultipathogenGenomics/castanet, implemented in Python 3.10 and supported in Ubuntu Linux 22.04. The data underlying this article are available in Europe Nucleotide Archives, at https://www.ebi.ac.uk/ena/browser/view/PRJEB77004.

摘要

动机

目标富集策略可在单个过程中从多种病原体生成基因组数据,大大提高了宏基因组测序的灵敏度,并实现了具有成本效益的高通量监测和临床应用。然而,由于缺乏专门针对多病原体富集序列数据分析而设计的计算工具,研究和临床实验室对其采用受到限制。在这里,我们提出了一种分析管道 Castanet,用于多病原体富集测序数据。Castanet 旨在与现有靶向富集策略生成的短读数据一起使用,但可以轻松部署在由另一种方法生成的任何 BAM 文件上。还包括一个可选的图形界面和安装脚本。

结果

除了基因组重建外,Castanet 还报告了特定于方法的指标,这些指标可用于量化捕获效率、估计病原体载量、区分低水平阳性与污染,以及评估测序质量。Castanet 可作为传统的端到端共识生成管道使用,但它的优势在于能够直接从多病原体富集实验处理灵活的、预定义的一组感兴趣的病原体。在我们的测试中,Castanet 共识序列是参考序列的准确重建,包括存在同一病原体的多个菌株的情况。Castanet 在标准计算机上执行效果良好,并且可以使用单个批处理命令处理 96 个样本富集测序运行(50M 个读取)的整个输出,耗时不到 2 小时。

可用性和实现

源代码在 GPL-3 许可证下免费提供,网址为 https://github.com/MultipathogenGenomics/castanet,使用 Python 3.10 实现,并在 Ubuntu Linux 22.04 上得到支持。本文所依据的数据可在欧洲核苷酸档案库中获得,网址为 https://www.ebi.ac.uk/ena/browser/view/PRJEB77004。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/420c/11494375/8c4bdeb7bd67/btae591f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/420c/11494375/8798156b3664/btae591f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/420c/11494375/b9fbecb4cf62/btae591f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/420c/11494375/951c36ba1aa5/btae591f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/420c/11494375/8c4bdeb7bd67/btae591f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/420c/11494375/8798156b3664/btae591f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/420c/11494375/b9fbecb4cf62/btae591f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/420c/11494375/951c36ba1aa5/btae591f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/420c/11494375/8c4bdeb7bd67/btae591f4.jpg

相似文献

1
Castanet: a pipeline for rapid analysis of targeted multi-pathogen genomic data.Castanet:一种用于快速分析靶向多病原体基因组数据的管道。
Bioinformatics. 2024 Oct 1;40(10). doi: 10.1093/bioinformatics/btae591.
2
INSaFLU-TELEVIR: an open web-based bioinformatics suite for viral metagenomic detection and routine genomic surveillance.INSaFLU-TELEVIR:一个基于网络的开放式生物信息学套件,用于病毒宏基因组检测和常规基因组监测。
Genome Med. 2024 Apr 25;16(1):61. doi: 10.1186/s13073-024-01334-3.
3
Improving the sensitivity of long read overlap detection using grouped short k-mer matches.利用分组短 k-mer 匹配提高长读重叠检测的灵敏度。
BMC Genomics. 2019 Apr 4;20(Suppl 2):190. doi: 10.1186/s12864-019-5475-x.
4
ViraPipe: scalable parallel pipeline for viral metagenome analysis from next generation sequencing reads.ViraPipe:用于从下一代测序读取中进行病毒宏基因组分析的可扩展并行管道。
Bioinformatics. 2018 Mar 15;34(6):928-935. doi: 10.1093/bioinformatics/btx702.
5
Advancing metagenome-assembled genome-based pathogen identification: unraveling the power of long-read assembly algorithms in Oxford Nanopore sequencing.推进宏基因组组装基因组为基础的病原体鉴定:揭示长读长组装算法在牛津纳米孔测序中的强大功能。
Microbiol Spectr. 2024 Jun 4;12(6):e0011724. doi: 10.1128/spectrum.00117-24. Epub 2024 Apr 30.
6
Sunbeam: an extensible pipeline for analyzing metagenomic sequencing experiments.Sunbeam:用于分析宏基因组测序实验的可扩展流水线。
Microbiome. 2019 Mar 22;7(1):46. doi: 10.1186/s40168-019-0658-x.
7
PAIPline: pathogen identification in metagenomic and clinical next generation sequencing samples.PAIPline:宏基因组和临床下一代测序样本中的病原体鉴定。
Bioinformatics. 2018 Sep 1;34(17):i715-i721. doi: 10.1093/bioinformatics/bty595.
8
16SPIP: a comprehensive analysis pipeline for rapid pathogen detection in clinical samples based on 16S metagenomic sequencing.16SPIP:基于 16S 宏基因组测序的临床样本中快速病原体检测的综合分析流程。
BMC Bioinformatics. 2017 Dec 28;18(Suppl 16):568. doi: 10.1186/s12859-017-1975-3.
9
A pipeline for local assembly of minisatellite alleles from single-molecule sequencing data.一种用于从单分子测序数据中进行小卫星等位基因本地组装的流程。
Bioinformatics. 2017 Mar 1;33(5):650-653. doi: 10.1093/bioinformatics/btw687.
10
Full-Length Envelope Analyzer (FLEA): A tool for longitudinal analysis of viral amplicons.全长包膜分析器(FLEA):一种用于病毒扩增子纵向分析的工具。
PLoS Comput Biol. 2018 Dec 13;14(12):e1006498. doi: 10.1371/journal.pcbi.1006498. eCollection 2018 Dec.

本文引用的文献

1
Targeted metagenomics reveals association between severity and pathogen co-detection in infants with respiratory syncytial virus.靶向宏基因组学揭示了呼吸道合胞病毒感染婴儿严重程度与病原体共检出的相关性。
Nat Commun. 2024 Mar 16;15(1):2379. doi: 10.1038/s41467-024-46648-3.
2
Cost of genetic testing, delayed care, and suboptimal treatment associated with polymerase chain reaction versus next-generation sequencing biomarker testing for genomic alterations in metastatic non-small cell lung cancer.聚合酶链式反应与下一代测序生物标志物检测在转移性非小细胞肺癌基因组改变中的成本、延迟护理和治疗效果不佳。
J Med Econ. 2024 Jan-Dec;27(1):292-303. doi: 10.1080/13696998.2024.2314430. Epub 2024 Feb 24.
3
Validation of an HIV whole genome sequencing method for HIV drug resistance testing in an Australian clinical microbiology laboratory.
验证一种在澳大利亚临床微生物学实验室中用于 HIV 耐药性检测的 HIV 全基因组测序方法。
J Med Virol. 2023 Dec;95(12):e29273. doi: 10.1002/jmv.29273.
4
Assessment and validation of enrichment and target capture approaches to improve WGS from direct patient samples.评估和验证富集和目标捕获方法,以提高直接来自患者样本的 WGS 数据质量。
J Clin Microbiol. 2023 Oct 24;61(10):e0038223. doi: 10.1128/jcm.00382-23. Epub 2023 Sep 20.
5
Update on the Phylodynamic and Genetic Variability of Marburg Virus.马尔堡病毒的系统发育和遗传变异性的最新研究进展。
Viruses. 2023 Aug 11;15(8):1721. doi: 10.3390/v15081721.
6
ViralConsensus: a fast and memory-efficient tool for calling viral consensus genome sequences directly from read alignment data.ViralConsensus:一种快速且节省内存的工具,可直接从读取比对数据中调用病毒共识基因组序列。
Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad317.
7
Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4.利用 MetaPhlAn 4 对未鉴定物种进行宏基因组分类分析的扩展和改进。
Nat Biotechnol. 2023 Nov;41(11):1633-1644. doi: 10.1038/s41587-023-01688-w. Epub 2023 Feb 23.
8
Contamination Issue in Viral Metagenomics: Problems, Solutions, and Clinical Perspectives.病毒宏基因组学中的污染问题:问题、解决方案及临床视角
Front Microbiol. 2021 Oct 20;12:745076. doi: 10.3389/fmicb.2021.745076. eCollection 2021.
9
Current Status of HIV-1 Vaccines.HIV-1疫苗的现状
Vaccines (Basel). 2021 Sep 16;9(9):1026. doi: 10.3390/vaccines9091026.
10
Distinct patterns of within-host virus populations between two subgroups of human respiratory syncytial virus.两组人呼吸道合胞病毒病毒种群在宿主内的不同模式。
Nat Commun. 2021 Aug 26;12(1):5125. doi: 10.1038/s41467-021-25265-4.