Wang Nan, Wang Teng, Ning Kang
Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center of AI Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, Hubei, China.
Environ Sci Ecotechnol. 2023 Jul 26;17:100304. doi: 10.1016/j.ese.2023.100304. eCollection 2024 Jan.
Microbiome research has generated an extensive amount of data, resulting in a wealth of publicly accessible samples. Accurate annotation of these samples is crucial for effectively utilizing microbiome data across scientific disciplines. However, a notable challenge arises from the lack of essential annotations, particularly regarding collection location and sample biome information, which significantly hinders environmental microbiome research. In this study, we introduce Meta-Sorter, a novel approach utilizing neural networks and transfer learning, to enhance biome labeling for thousands of microbiome samples in the MGnify database that have incomplete information. Our findings demonstrate that Meta-Sorter achieved a remarkable accuracy rate of 96.7% in classifying samples among the 16,507 lacking detailed biome annotations. Notably, Meta-Sorter provides precise classifications for representative environmental samples that were previously ambiguously labeled as "Marine" in MGnify, thereby elucidating their specific origins in benthic and water column environments. Moreover, Meta-Sorter effectively distinguishes samples derived from human-environment interactions, enabling clear differentiation between environmental and human-related studies. By improving the completeness of biome label information for numerous microbial community samples, our research facilitates more accurate knowledge discovery across diverse disciplines, with particular implications for environmental research.
微生物组研究已产生了大量数据,形成了丰富的可公开获取的样本。准确注释这些样本对于跨学科有效利用微生物组数据至关重要。然而,一个显著的挑战源于缺乏必要的注释,特别是关于采集地点和样本生物群落信息,这严重阻碍了环境微生物组研究。在本研究中,我们引入了Meta-Sorter,这是一种利用神经网络和迁移学习的新方法,用于增强MGnify数据库中数千个信息不完整的微生物组样本的生物群落标签。我们的研究结果表明,Meta-Sorter在对16507个缺乏详细生物群落注释的样本进行分类时,准确率达到了96.7%。值得注意的是,Meta-Sorter为MGnify中先前被模糊标记为“海洋”的代表性环境样本提供了精确分类,从而阐明了它们在底栖和水柱环境中的具体来源。此外,Meta-Sorter有效地区分了源自人类与环境相互作用的样本,能够清晰地区分环境研究和与人类相关的研究。通过提高众多微生物群落样本的生物群落标签信息的完整性,我们的研究促进了跨学科更准确的知识发现,对环境研究具有特别重要的意义。