Suppr超能文献

利用神经网络挖掘文本并预测数千种微生物的代谢特征。

Using neural networks to mine text and predict metabolic traits for thousands of microbes.

机构信息

Department of Animal Science, University of California, Davis, United States of America.

出版信息

PLoS Comput Biol. 2021 Mar 2;17(3):e1008757. doi: 10.1371/journal.pcbi.1008757. eCollection 2021 Mar.

Abstract

Microbes can metabolize more chemical compounds than any other group of organisms. As a result, their metabolism is of interest to investigators across biology. Despite the interest, information on metabolism of specific microbes is hard to access. Information is buried in text of books and journals, and investigators have no easy way to extract it out. Here we investigate if neural networks can extract out this information and predict metabolic traits. For proof of concept, we predicted two traits: whether microbes carry one type of metabolism (fermentation) or produce one metabolite (acetate). We collected written descriptions of 7,021 species of bacteria and archaea from Bergey's Manual. We read the descriptions and manually identified (labeled) which species were fermentative or produced acetate. We then trained neural networks to predict these labels. In total, we identified 2,364 species as fermentative, and 1,009 species as also producing acetate. Neural networks could predict which species were fermentative with 97.3% accuracy. Accuracy was even higher (98.6%) when predicting species also producing acetate. Phylogenetic trees of species and their traits confirmed that predictions were accurate. Our approach with neural networks can extract information efficiently and accurately. It paves the way for putting more metabolic traits into databases, providing easy access of information to investigators.

摘要

微生物能够代谢的化合物种类比任何其他生物群体都要多。因此,它们的代谢活动引起了生物学领域众多研究者的关注。尽管大家对此很感兴趣,但特定微生物的代谢信息却很难获取。这些信息隐藏在书籍和期刊的文本中,研究者们没有简单的方法将其提取出来。在这里,我们研究了神经网络是否可以提取这些信息并预测代谢特征。为了验证这一概念,我们预测了两个特征:微生物是否携带一种代谢类型(发酵)或产生一种代谢物(乙酸盐)。我们从《伯杰氏系统细菌学手册》中收集了 7021 种细菌和古菌的书面描述。我们阅读了这些描述,并手动确定(标记)了哪些物种是发酵型的,或者产生乙酸盐。然后,我们训练神经网络来预测这些标签。我们总共确定了 2364 种为发酵型,1009 种也产生乙酸盐。神经网络可以预测出哪些物种是发酵型,准确率达到 97.3%。当预测也产生乙酸盐的物种时,准确率甚至更高(98.6%)。物种及其特征的系统发育树证实了预测的准确性。我们使用神经网络的方法可以高效、准确地提取信息。它为将更多代谢特征纳入数据库铺平了道路,为研究者提供了便捷获取信息的途径。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a29/7954334/3fbb26ca783c/pcbi.1008757.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验