解析 CAMDA MetaSUB 挑战赛数据的城市特定特征并识别样本来源位置。

Unraveling city-specific signature and identifying sample origin locations for the data from CAMDA MetaSUB challenge.

机构信息

Department of Biostatistics, University of Florida, 2004 Mowry Rd, Gainesville, FL, 32610, USA.

Department of Oral Biology, University of Florida, 1395 Center Drive, Gainesville, FL, 32610, USA.

出版信息

Biol Direct. 2021 Jan 4;16(1):1. doi: 10.1186/s13062-020-00284-1.

DOI:10.1186/s13062-020-00284-1

PMID:33397406

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7780616/

Abstract

BACKGROUND

Composition of microbial communities can be location-specific, and the different abundance of taxon within location could help us to unravel city-specific signature and predict the sample origin locations accurately. In this study, the whole genome shotgun (WGS) metagenomics data from samples across 16 cities around the world and samples from another 8 cities were provided as the main and mystery datasets respectively as the part of the CAMDA 2019 MetaSUB "Forensic Challenge". The feature selecting, normalization, three methods of machine learning, PCoA (Principal Coordinates Analysis) and ANCOM (Analysis of composition of microbiomes) were conducted for both the main and mystery datasets.

RESULTS

Features selecting, combined with the machines learning methods, revealed that the combination of the common features was effective for predicting the origin of the samples. The average error rates of 11.93 and 30.37% of three machine learning methods were obtained for main and mystery datasets respectively. Using the samples from main dataset to predict the labels of samples from mystery dataset, nearly 89.98% of the test samples could be correctly labeled as "mystery" samples. PCoA showed that nearly 60% of the total variability of the data could be explained by the first two PCoA axes. Although many cities overlapped, the separation of some cities was found in PCoA. The results of ANCOM, combined with importance score from the Random Forest, indicated that the common "family", "order" of the main-dataset and the common "order" of the mystery dataset provided the most efficient information for prediction respectively.

CONCLUSIONS

The results of the classification suggested that the composition of the microbiomes was distinctive across the cities, which could be used to identify the sample origins. This was also supported by the results from ANCOM and importance score from the RF. In addition, the accuracy of the prediction could be improved by more samples and better sequencing depth.

摘要

背景

微生物群落的组成可能具有特定的位置特征，而特定位置中分类单元的不同丰度可以帮助我们揭示城市特有的特征，并准确预测样本的来源地。在这项研究中，来自全球 16 个城市的样本的全基因组鸟枪法（WGS）宏基因组学数据和来自另外 8 个城市的样本作为主要数据集和神秘数据集分别提供，作为 2019 年 CAMDA 宏基因组学数据分析挑战赛“法医挑战”的一部分。对主要数据集和神秘数据集都进行了特征选择、归一化以及三种机器学习方法（主坐标分析（PCoA）和微生物组组成分析（ANCOM））。

结果

特征选择与机器学习方法相结合，表明共同特征的组合对预测样本来源非常有效。三种机器学习方法的平均错误率分别为主要数据集和神秘数据集的 11.93%和 30.37%。使用主要数据集的样本预测神秘数据集的标签，近 89.98%的测试样本可以正确标记为“神秘”样本。PCoA 表明，数据的总可变性近 60%可以用前两个 PCoA 轴来解释。虽然许多城市重叠，但在 PCoA 中发现了一些城市的分离。结合随机森林的重要性得分的 ANCOM 结果表明，主要数据集的共同“科”和神秘数据集的共同“目”为预测提供了最有效的信息。

结论

分类结果表明，微生物组的组成在不同城市之间具有独特性，可以用于识别样本来源。这也得到了 ANCOM 和 RF 的重要性得分的支持。此外，通过增加样本数量和提高测序深度可以提高预测的准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dec/7780616/e77734253db1/13062_2020_284_Fig1_HTML.jpg

相似文献

Unraveling city-specific signature and identifying sample origin locations for the data from CAMDA MetaSUB challenge.解析 CAMDA MetaSUB 挑战赛数据的城市特定特征并识别样本来源位置。

Biol Direct. 2021 Jan 4;16(1):1. doi: 10.1186/s13062-020-00284-1.

Identification of city specific important bacterial signature for the MetaSUB CAMDA challenge microbiome data.鉴定城市特有重要细菌特征，用于 MetaSUB CAMDA 挑战赛微生物组数据。

Biol Direct. 2019 Jul 24;14(1):11. doi: 10.1186/s13062-019-0243-z.

Unraveling City-Specific Microbial Signatures and Identifying Sample Origins for the Data From CAMDA 2020 Metagenomic Geolocation Challenge.解析特定城市的微生物特征并为2020年CAMDA宏基因组地理定位挑战赛的数据确定样本来源。

Front Genet. 2021 Aug 5;12:659650. doi: 10.3389/fgene.2021.659650. eCollection 2021.

Application of machine learning techniques for creating urban microbial fingerprints.应用机器学习技术构建城市微生物指纹图谱。

Biol Direct. 2019 Aug 16;14(1):13. doi: 10.1186/s13062-019-0245-x.

A machine learning framework to determine geolocations from metagenomic profiling.基于宏基因组分析的地理位置确定机器学习框架。

Biol Direct. 2020 Nov 23;15(1):27. doi: 10.1186/s13062-020-00278-z.

Systematic evaluation of supervised machine learning for sample origin prediction using metagenomic sequencing data.基于宏基因组测序数据的样本来源预测的有监督机器学习方法的系统评价。

Biol Direct. 2020 Dec 10;15(1):29. doi: 10.1186/s13062-020-00287-y.

Massive metagenomic data analysis using abundance-based machine learning.基于丰度的机器学习在海量宏基因组数据分析中的应用。

Biol Direct. 2019 Aug 1;14(1):12. doi: 10.1186/s13062-019-0242-0.

Unraveling bacterial fingerprints of city subways from microbiome 16S gene profiles.从微生物组 16S 基因图谱中揭示城市地铁的细菌指纹。

Biol Direct. 2018 May 22;13(1):10. doi: 10.1186/s13062-018-0215-8.

Fingerprinting cities: differentiating subway microbiome functionality.城市指纹：区分地铁微生物组功能。

Biol Direct. 2019 Oct 30;14(1):19. doi: 10.1186/s13062-019-0252-y.

Supervised Machine Learning Enables Geospatial Microbial Provenance.监督机器学习实现了微生物的地理来源。

Genes (Basel). 2022 Oct 21;13(10):1914. doi: 10.3390/genes13101914.

引用本文的文献

Citywide metagenomic surveillance of food centres reveals local microbial signatures and antibiotic resistance gene enrichment.全市食品中心的宏基因组监测揭示了当地的微生物特征和抗生素抗性基因富集。

NPJ Antimicrob Resist. 2025 Jul 8;3(1):63. doi: 10.1038/s44259-025-00132-0.

Topological data analysis captures horizontal gene transfer in antimicrobial resistance gene families among clinically relevant bacteria.拓扑数据分析揭示了临床相关细菌中抗菌抗性基因家族的水平基因转移。

Front Microbiol. 2025 May 7;16:1461293. doi: 10.3389/fmicb.2025.1461293. eCollection 2025.

CAMDA 2023: Finding patterns in urban microbiomes.CAMDA 2023：探寻城市微生物群落中的模式。

Front Genet. 2024 Nov 25;15:1449461. doi: 10.3389/fgene.2024.1449461. eCollection 2024.

Study on the Mechanism of Competing Endogenous Network of ' D.Don- in the Treatment of NSCLC based on Bioinformatics, Molecular Dynamics and Experimental Verification.基于生物信息学、分子动力学和实验验证的“冬凌草治疗非小细胞肺癌竞争性内源网络机制研究”

Curr Comput Aided Drug Des. 2025;21(3):403-423. doi: 10.2174/0115734099288771240419110716.

Targeting the gut microbiota for cancer therapy.以肠道微生物群为靶点进行癌症治疗。

Nat Rev Cancer. 2022 Dec;22(12):703-722. doi: 10.1038/s41568-022-00513-x. Epub 2022 Oct 17.

Facts and Hopes for Gut Microbiota Interventions in Cancer Immunotherapy.肠道微生物组干预癌症免疫疗法的事实与展望。

Clin Cancer Res. 2022 Oct 14;28(20):4370-4384. doi: 10.1158/1078-0432.CCR-21-1129.

Metagenomic Geolocation Using Read Signatures.利用读取特征进行宏基因组地理定位

Front Genet. 2022 Feb 28;13:643592. doi: 10.3389/fgene.2022.643592. eCollection 2022.

Intestinal microbiota signatures of clinical response and immune-related adverse events in melanoma patients treated with anti-PD-1.抗 PD-1 治疗的黑色素瘤患者临床应答和免疫相关不良事件的肠道微生物组特征

Nat Med. 2022 Mar;28(3):545-556. doi: 10.1038/s41591-022-01698-2. Epub 2022 Feb 28.

Front Genet. 2021 Aug 5;12:659650. doi: 10.3389/fgene.2021.659650. eCollection 2021.

本文引用的文献

Identification of city specific important bacterial signature for the MetaSUB CAMDA challenge microbiome data.鉴定城市特有重要细菌特征，用于 MetaSUB CAMDA 挑战赛微生物组数据。

Biol Direct. 2019 Jul 24;14(1):11. doi: 10.1186/s13062-019-0243-z.

Unraveling bacterial fingerprints of city subways from microbiome 16S gene profiles.从微生物组 16S 基因图谱中揭示城市地铁的细菌指纹。

Biol Direct. 2018 May 22;13(1):10. doi: 10.1186/s13062-018-0215-8.

A global atlas of the dominant bacteria found in soil.土壤中优势细菌的全球图谱。

Science. 2018 Jan 19;359(6373):320-325. doi: 10.1126/science.aap9516.

Analysis of the microbiome: Advantages of whole genome shotgun versus 16S amplicon sequencing.微生物组分析：全基因组鸟枪法测序与16S扩增子测序的优势

Biochem Biophys Res Commun. 2016 Jan 22;469(4):967-77. doi: 10.1016/j.bbrc.2015.12.083. Epub 2015 Dec 22.

Analysis of composition of microbiomes: a novel method for studying microbial composition.微生物群落组成分析：一种研究微生物组成的新方法。

Microb Ecol Health Dis. 2015 May 29;26:27663. doi: 10.3402/mehd.v26.27663. eCollection 2015.

limma powers differential expression analyses for RNA-sequencing and microarray studies.limma为RNA测序和微阵列研究提供差异表达分析的动力。

Nucleic Acids Res. 2015 Apr 20;43(7):e47. doi: 10.1093/nar/gkv007. Epub 2015 Jan 20.

voom: Precision weights unlock linear model analysis tools for RNA-seq read counts.voom：精确权重为RNA测序读数计数解锁线性模型分析工具。

Genome Biol. 2014 Feb 3;15(2):R29. doi: 10.1186/gb-2014-15-2-r29.

NGS QC Toolkit: a toolkit for quality control of next generation sequencing data.NGS QC 工具包：下一代测序数据质量控制工具包。

PLoS One. 2012;7(2):e30619. doi: 10.1371/journal.pone.0030619. Epub 2012 Feb 1.

Using QIIME to analyze 16S rRNA gene sequences from microbial communities.使用QIIME分析来自微生物群落的16S rRNA基因序列。

Curr Protoc Bioinformatics. 2011 Dec;Chapter 10:10.7.1-10.7.20. doi: 10.1002/0471250953.bi1007s36.

Metagenomic analyses: past and future trends.宏基因组分析：过去和未来的趋势。

Appl Environ Microbiol. 2011 Feb;77(4):1153-61. doi: 10.1128/AEM.02345-10. Epub 2010 Dec 17.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

解析 CAMDA MetaSUB 挑战赛数据的城市特定特征并识别样本来源位置。

Unraveling city-specific signature and identifying sample origin locations for the data from CAMDA MetaSUB challenge.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献