Zhao Weizhong, Chen James J, Foley Steven, Wang Yuping, Zhao Shaohua, Basinger John, Zou Wen
Division of Bioinformatics & Biostatistics, National Center for Toxicological Research, US Food & Drug Administration, 3900 NCTR Rd., Jefferson, AR 72079, USA.
Division of Microbiology, National Center for Toxicological Research, US Food & Drug Administration, 3900 NCTR Rd., Jefferson, AR 72079, USA.
Biomark Med. 2015;9(11):1253-64. doi: 10.2217/bmm.15.88. Epub 2015 Oct 26.
The purpose was to develop an analytical pipeline for specific gene analysis and biomarker discovery from next generation sequencing (NGS) data.
MATERIALS & METHODS: As a test case, the fliC gene reference sequences of 24 Salmonella enterica strains of 13 serotypes and NGS reads of 32 serovar Newport, 48 Montevideo and 115 Enteritidis outbreak isolates were retrieved from the National Center for Biotechnology Information database.
Establishment of an analytical pipeline consisting of four steps: reference sequences retrieval and template sequence determination; NGS sequence reads retrieval; multiple sequence alignments and phylogenetic analysis; data mining and biomarker discovery.
The pipeline developed provides an effective bioinformatics tool for genetic diversity clarification and marker sequences discovery for pathogen characterization and surveillance.
旨在开发一种用于从下一代测序(NGS)数据中进行特定基因分析和生物标志物发现的分析流程。
作为一个测试案例,从美国国立生物技术信息中心数据库中检索了13种血清型的24株肠炎沙门氏菌菌株的fliC基因参考序列,以及32株纽波特血清型、48株蒙得维的亚血清型和115株肠炎血清型暴发分离株的NGS读数。
建立了一个由四个步骤组成的分析流程:参考序列检索和模板序列确定;NGS序列读数检索;多序列比对和系统发育分析;数据挖掘和生物标志物发现。
所开发的流程为病原体特征描述和监测中的遗传多样性阐明及标记序列发现提供了一种有效的生物信息学工具。