Zhelyazkova Maya, Yordanova Roumyana, Mihaylov Iliyan, Kirov Stefan, Tsonev Stefan, Danko David, Mason Christopher, Vassilev Dimitar
Faculty of Mathematics and Informatics, Sofia University St. Kliment Ohridski, Sofia, Bulgaria.
Department of Mathematics, Hokkaido University, Sapporo, Japan.
Front Genet. 2021 Mar 4;12:642991. doi: 10.3389/fgene.2021.642991. eCollection 2021.
The steady elaboration of the Metagenomic and Metadesign of Subways and Urban Biomes (MetaSUB) international consortium project raises important new questions about the origin, variation, and antimicrobial resistance of the collected samples. CAMDA (Critical Assessment of Massive Data Analysis, http://camda.info/) forum organizes annual challenges where different bioinformatics and statistical approaches are tested on samples collected around the world for bacterial classification and prediction of geographical origin. This work proposes a method which not only predicts the locations of unknown samples, but also estimates the relative risk of antimicrobial resistance through spatial modeling. We introduce a new component in the standard analysis as we apply a Bayesian spatial convolution model which accounts for spatial structure of the data as defined by the longitude and latitude of the samples and assess the relative risk of antimicrobial resistance taxa across regions which is relevant to public health. We can then use the estimated relative risk as a new measure for antimicrobial resistance. We also compare the performance of several machine learning methods, such as Gradient Boosting Machine, Random Forest, and Neural Network to predict the geographical origin of the mystery samples. All three methods show consistent results with some superiority of Random Forest classifier. In our future work we can consider a broader class of spatial models and incorporate covariates related to the environment and climate profiles of the samples to achieve more reliable estimation of the relative risk related to antimicrobial resistance.
宏基因组与地铁及城市生物群落元设计(MetaSUB)国际合作项目的稳步推进,引发了有关所采集样本的起源、变异及抗微生物药物耐药性的重要新问题。CAMDA(海量数据分析关键评估,http://camda.info/)论坛组织年度挑战赛,在全球采集的样本上测试不同的生物信息学和统计方法,以进行细菌分类和地理起源预测。这项工作提出了一种方法,该方法不仅能预测未知样本的位置,还能通过空间建模估计抗微生物药物耐药性的相对风险。在应用贝叶斯空间卷积模型时,我们在标准分析中引入了一个新组件,该模型考虑了由样本的经度和纬度定义的数据空间结构,并评估了与公共卫生相关的各区域抗微生物药物耐药分类群的相对风险。然后,我们可以将估计的相对风险用作抗微生物药物耐药性的新度量。我们还比较了几种机器学习方法(如梯度提升机、随机森林和神经网络)预测神秘样本地理起源的性能。所有这三种方法都显示出一致的结果,随机森林分类器具有一定优势。在未来的工作中,我们可以考虑更广泛的空间模型类别,并纳入与样本的环境和气候特征相关的协变量,以实现对抗微生物药物耐药性相关相对风险更可靠的估计。