Aja-Macaya Pablo, Conde-Pérez Kelly, Trigo-Tasende Noelia, Buetas Elena, Nasser-Ali Mohammed, Nión Paula, Rumbo-Feal Soraya, Ladra Susana, Bou Germán, Mira Álex, Vallejo Juan A, Poza Margarita
Grupo de Investigación en Microbiología, Instituto de Investigación Biomédica de A Coruña (INIBIC), Complejo Hospitalario Universitario de A Coruña (CHUAC), Sergas, Universidade de A Coruña (UDC), As Xubias, 15006, A Coruña, Galicia, Spain.
Database Laboratory (LBD), CITIC, Universidade da Coruña (UDC), Campus de Elviña, 15071, A Coruña, Galicia, Spain.
Sci Rep. 2025 Jul 21;15(1):26486. doi: 10.1038/s41598-025-10999-8.
Discovery of disease-related bacterial biomarkers could be a useful approach for early prevention or diagnosis of various afflictions, such as colorectal cancer. This typically involves analyzing small regions of the 16S rRNA gene (e.g. V3V4) through short-read technologies like Illumina, obtaining genus-level results. However, recent developments in third-generation sequencing, such as Oxford Nanopore Technologies (ONT)'s new R10.4.1 chemistry and its improved basecalling models, are beginning to allow for a more complete and accessible species-level analysis through full-length 16S rRNA gene sequencing (spanning regions V1-V9). Thus, the goal of this study was to compare and evaluate both approaches, using colorectal cancer biomarker discovery as a representative case. This was achieved through the analysis of feces from 123 subjects, comparing both methods (Illumina-V3V4 with DADA2 and QIIME2 vs. ONT-V1V9 with Emu), multiple Dorado basecalling models (fast, hac and sup) and multiple databases (SILVA vs. Emu's Default database). Basecalling models broadly resulted in similar taxonomic output, but had significantly higher observed species and different taxonomic identification the lower the basecalling quality (p-value<0.05). Database choice with Emu influenced the identified species greatly, with Emu's Default database obtaining significantly higher diversity and identified species than SILVA (p-value<0.05). However, it overconfidently classified at times what should be an unknown species as the closest match due to its database structure. Bacterial abundance between Illumina-V3V4 and ONT-V1V9 at the genus level correlated well (R≥0.8). Nanopore sequencing identified more specific bacterial biomarkers for colorectal cancer than those obtained with Illumina, such as Parvimonas micra, Fusobacterium nucleatum, Peptostreptococcus stomatis, Peptostreptococcus anaerobius, Gemella morbillorum, Clostridium perfringens, Bacteroides fragilis and Sutterella wadsworthensis. Prediction of colorectal cancer through manual feature selection and machine learning resulted in an AUC of 0.87 with 14 species or 0.82 with just 4 species (P. micra, F. nucleatum, B. fragilis and Agathobaculum butyriciproducens). Full 16S rRNA V1V9 sequencing through Oxford Nanopore and its new R10.4.1 chemistry achieved accurate species-level bacterial identification, facilitating the discovery of more precise disease-related biomarkers and increasing the taxonomic fidelity of future microbiome analyses.
发现与疾病相关的细菌生物标志物可能是早期预防或诊断各种疾病(如结直肠癌)的一种有用方法。这通常涉及通过Illumina等短读长技术分析16S rRNA基因的小区域(如V3V4),获得属水平的结果。然而,第三代测序技术的最新进展,如牛津纳米孔技术公司(ONT)的新型R10.4.1化学技术及其改进的碱基识别模型,开始允许通过全长16S rRNA基因测序(跨越V1 - V9区域)进行更完整且可及的物种水平分析。因此,本研究的目的是以结直肠癌生物标志物发现作为代表性案例,比较和评估这两种方法。这是通过分析123名受试者的粪便来实现的,比较了两种方法(Illumina - V3V4搭配DADA2和QIIME2与ONT - V1V9搭配Emu)、多种Dorado碱基识别模型(快速、hac和sup)以及多个数据库(SILVA与Emu的默认数据库)。碱基识别模型大致产生了相似的分类学输出,但碱基识别质量越低,观察到的物种显著越多且分类学鉴定不同(p值<0.05)。使用Emu时数据库的选择对鉴定出的物种有很大影响,Emu的默认数据库比SILVA获得了显著更高的多样性和鉴定出的物种(p值<0.05)。然而,由于其数据库结构,它有时会过度自信地将本应是未知物种的样本分类为最接近的匹配物种。Illumina - V3V4和ONT - V1V9在属水平上的细菌丰度相关性良好(R≥0.8)。纳米孔测序比Illumina鉴定出了更多用于结直肠癌的特异性细菌生物标志物,如微小消化链球菌、具核梭杆菌、口腔消化链球菌厌氧消化链球菌、麻疹孪生球菌、产气荚膜梭菌、脆弱拟杆菌和沃兹沃思萨特菌。通过手动特征选择和机器学习对结直肠癌进行预测,14种物种时的曲线下面积(AUC)为0.87,仅4种物种(微小消化链球菌、具核梭杆菌、脆弱拟杆菌和丁酸产阿加西杆菌)时为0.82。通过牛津纳米孔及其新型R10.4.1化学技术进行的全长16S rRNA V1V9测序实现了准确的物种水平细菌鉴定,有助于发现更精确的疾病相关生物标志物,并提高未来微生物组分析的分类学保真度。
Clin Orthop Relat Res. 2024-9-1
Health Technol Assess. 2006-9
J Pharm Bioallied Sci. 2024-2
Microorganisms. 2024-2-27
Chin J Cancer Res. 2024-2-29
Bioinformatics. 2023-5-4