Ramos Lopez Daniel, Flores Francisco J, Espindola Andres S
Institute for Biosecurity and Microbial Forensics (IBMF), Oklahoma State University, Stillwater, OK 74078, USA.
Department of Entomology and Plant Pathology, Oklahoma State University, Stillwater, OK 74078, USA.
Biology (Basel). 2025 Jan 14;14(1):69. doi: 10.3390/biology14010069.
Metagenomics analysis has enabled the measurement of the microbiome diversity in environmental samples without prior targeted enrichment. Functional and phylogenetic studies based on microbial diversity retrieved using HTS platforms have advanced from detecting known organisms and discovering unknown species to applications in disease diagnostics. Robust validation processes are essential for test reliability, requiring standard samples and databases deriving from real samples and in silico generated artificial controls. We propose a MeStanG as a resource for generating HTS Nanopore data sets to evaluate present and emerging bioinformatics pipelines. MeStanG allows samples to be designed with user-defined organism abundances expressed as number of reads, reference sequences, and predetermined or custom errors by sequencing profiles. The simulator pipeline was evaluated by analyzing its output mock metagenomic samples containing known read abundances using read mapping, genome assembly, and taxonomic classification on three scenarios: a bacterial community composed of nine different organisms, samples resembling pathogen-infected wheat plants, and a viral pathogen serial dilution sampling. The evaluation was able to report consistently the same organisms, and their read abundances as provided in the mock metagenomic sample design. Based on this performance and its novel capacity of generating exact number of reads, MeStanG can be used by scientists to develop mock metagenomic samples (artificial HTS data sets) to assess the diagnostic performance metrics of bioinformatic pipelines, allowing the user to choose predetermined or customized models for research and training.
宏基因组学分析能够在无需事先进行靶向富集的情况下,对环境样本中的微生物群落多样性进行测量。基于使用高通量测序(HTS)平台获取的微生物多样性开展的功能和系统发育研究,已从检测已知生物体和发现未知物种发展到应用于疾病诊断。强大的验证流程对于测试可靠性至关重要,这需要源自真实样本的标准样本和数据库以及计算机生成的人工对照。我们提出了MeStanG,作为一种用于生成HTS纳米孔数据集以评估当前和新兴生物信息学流程的资源。MeStanG允许用户根据测序图谱,以读取数、参考序列以及预定或自定义错误来设计具有用户定义生物体丰度的样本。通过在三种情况下对包含已知读取丰度的模拟宏基因组样本输出进行分析,即由九种不同生物体组成的细菌群落、类似病原体感染小麦植株的样本以及病毒病原体系列稀释采样,对模拟器流程进行了评估。评估能够一致地报告模拟宏基因组样本设计中所提供的相同生物体及其读取丰度。基于这一性能及其生成精确读取数的新能力,科学家可以使用MeStanG来开发模拟宏基因组样本(人工HTS数据集),以评估生物信息学流程的诊断性能指标,允许用户选择预定或定制模型用于研究和培训。