Giulia Agostinetto, Anna Sandionigi, Antonia Bruno, Dario Pescini, Maurizio Casiraghi
Department of Biotechnology and Biosciences, University of Milano-Bicocca, Milan, Italy.
Quantia Consulting Srl, Milan, Italy.
Front Bioinform. 2022 Jan 10;1:794547. doi: 10.3389/fbinf.2021.794547. eCollection 2021.
Boosted by the exponential growth of microbiome-based studies, analyzing microbiome patterns is now a hot-topic, finding different fields of application. In particular, the use of machine learning techniques is increasing in microbiome studies, providing deep insights into microbial community composition. In this context, in order to investigate microbial patterns from 16S rRNA metabarcoding data, we explored the effectiveness of Association Rule Mining (ARM) technique, a supervised-machine learning procedure, to extract patterns (in this work, intended as groups of species or taxa) from microbiome data. ARM can generate huge amounts of data, making spurious information removal and visualizing results challenging. Our work sheds light on the strengths and weaknesses of pattern mining strategy into the study of microbial patterns, in particular from 16S rRNA microbiome datasets, applying ARM on real case studies and providing guidelines for future usage. Our results highlighted issues related to the type of input and the use of metadata in microbial pattern extraction, identifying the key steps that must be considered to apply ARM consciously on 16S rRNA microbiome data. To promote the use of ARM and the visualization of microbiome patterns, specifically, we developed microFIM (microbial Frequent Itemset Mining), a versatile Python tool that facilitates the use of ARM integrating common microbiome outputs, such as taxa tables. microFIM implements interest measures to remove spurious information and merges the results of ARM analysis with the common microbiome outputs, providing similar microbiome strategies that help scientists to integrate ARM in microbiome applications. With this work, we aimed at creating a bridge between microbial ecology researchers and ARM technique, making researchers aware about the strength and weaknesses of association rule mining approach.
在基于微生物组的研究呈指数增长的推动下,分析微生物组模式如今成为一个热门话题,并在不同领域得到应用。特别是,机器学习技术在微生物组研究中的应用日益增加,为深入了解微生物群落组成提供了帮助。在此背景下,为了从16S rRNA宏条形码数据中研究微生物模式,我们探索了关联规则挖掘(ARM)技术(一种监督式机器学习程序)从微生物组数据中提取模式(在本研究中,模式指物种或分类群组)的有效性。ARM会生成大量数据,使得去除虚假信息和可视化结果具有挑战性。我们的工作揭示了模式挖掘策略在微生物模式研究中的优势和劣势,特别是对于16S rRNA微生物组数据集,通过在实际案例研究中应用ARM并为未来使用提供指导方针。我们的结果突出了与微生物模式提取中输入类型和元数据使用相关的问题,确定了在16S rRNA微生物组数据上有意识地应用ARM时必须考虑的关键步骤。具体而言,为了促进ARM的使用和微生物组模式的可视化,我们开发了microFIM(微生物频繁项集挖掘),这是一个通用的Python工具,通过整合常见的微生物组输出(如分类单元表)来方便ARM的使用。microFIM实施兴趣度量以去除虚假信息,并将ARM分析结果与常见的微生物组输出合并,提供类似的微生物组策略,帮助科学家在微生物组应用中整合ARM。通过这项工作,我们旨在在微生物生态学研究人员和ARM技术之间架起一座桥梁,让研究人员了解关联规则挖掘方法的优势和劣势。