用于呼吸道病原体检测的宏基因组分析管道的设计与实现。
Design and implementation of a metagenomic analytical pipeline for respiratory pathogen detection.
机构信息
Institute of Biology, Federal University of Bahia (UFBA), Salvador, Brazil.
Center for Data and Knowledge Integration for Health (CIDACS), Gonçalo Moniz Institute, Oswaldo Cruz Foundation (Fiocruz), Salvador, Bahia, Brazil.
出版信息
BMC Res Notes. 2024 Oct 3;17(1):291. doi: 10.1186/s13104-024-06964-9.
OBJECTIVE
We developed an in-house bioinformatics pipeline to improve the detection of respiratory pathogens in metagenomic sequencing data. This pipeline addresses the need for short-time analysis, high accuracy, scalability, and reproducibility in a high-performance computing environment.
RESULTS
We evaluated our pipeline using ninety synthetic metagenomes designed to simulate nasopharyngeal swab samples. The pipeline successfully identified 177 out of 204 respiratory pathogens present in the compositions, with an average processing time of approximately 4 min per sample (processing 1 million paired-end reads of 150 base pairs). For the estimation of all the 470 taxa included in the compositions, the pipeline demonstrated high accuracy, identifying 420 and achieving a correlation of 0.9 between their actual and predicted relative abundances. Among the identified taxa, 27 were significantly underestimated or overestimated, including only three clinically relevant pathogens. We also validated the pipeline by applying it to a clinical dataset from a study on metagenomic pathogen characterization in patients with acute respiratory infections and successfully identified all pathogens responsible for the diagnosed infections. These findings underscore the pipeline's effectiveness in pathogen detection and highlight its potential utility in respiratory pathogen surveillance.
目的
我们开发了一个内部生物信息学管道,以提高宏基因组测序数据中呼吸道病原体的检测能力。该管道满足了在高性能计算环境中对短时间分析、高准确性、可扩展性和可重复性的需求。
结果
我们使用 90 个模拟鼻咽拭子样本的合成宏基因组来评估我们的管道。该管道成功识别了组成物中 204 种呼吸道病原体中的 177 种,平均每个样本的处理时间约为 4 分钟(处理 100 万个 150 碱基对的配对末端读取)。对于组成物中包含的 470 个分类群的估计,该管道表现出了很高的准确性,识别出了 420 个分类群,并实现了它们实际和预测相对丰度之间的相关性为 0.9。在鉴定出的分类群中,有 27 个被显著低估或高估,其中只有三种是具有临床相关性的病原体。我们还通过将该管道应用于一项急性呼吸道感染患者宏基因组病原体特征研究的临床数据集进行了验证,并成功鉴定出了所有导致诊断感染的病原体。这些发现突出了该管道在病原体检测方面的有效性,并强调了其在呼吸道病原体监测方面的潜在应用价值。