Suppr超能文献

通过机器学习鉴定聚磷菌的基因组序列

Identification of Genome Sequences of Polyphosphate-Accumulating Organisms by Machine Learning.

作者信息

Liu Bohan, Nan Jun, Zu Xuehui, Zhang Xinhui, Xiao Qiliang

机构信息

State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin, China.

出版信息

Front Cell Dev Biol. 2021 Jan 18;8:626221. doi: 10.3389/fcell.2020.626221. eCollection 2020.

Abstract

In the field of sewage treatment, the identification of polyphosphate-accumulating organisms (PAOs) usually relies on biological experiments. However, biological experiments are not only complicated and time-consuming, but also costly. In recent years, machine learning has been widely used in many fields, but it is seldom used in the water treatment. The present work presented a high accuracy support vector machine (SVM) algorithm to realize the rapid identification and prediction of PAOs. We obtained 6,318 genome sequences of microorganisms from the publicly available microbial genome database for comparative analysis (MBGD). Minimap2 was used to compare the genomes of the obtained microorganisms in pairs, and read the overlap. The SVM model was established using the similarity of the genome sequences. In this SVM model, the average accuracy is 0.9628 ± 0.019 with 10-fold cross-validation. By predicting 2,652 microorganisms, 22 potential PAOs were obtained. Through the analysis of the predicted potential PAOs, most of them could be indirectly verified their phosphorus removal characteristics from previous reports. The SVM model we built shows high prediction accuracy and good stability.

摘要

在污水处理领域,聚磷菌(PAOs)的鉴定通常依赖于生物实验。然而,生物实验不仅复杂、耗时,而且成本高昂。近年来,机器学习已在许多领域得到广泛应用,但在水处理中却很少使用。目前的工作提出了一种高精度支持向量机(SVM)算法,以实现对聚磷菌的快速鉴定和预测。我们从公开可用的微生物基因组数据库(MBGD)中获取了6318个微生物基因组序列用于比较分析。使用Minimap2对获得的微生物基因组进行两两比较,并读取重叠部分。利用基因组序列的相似性建立了支持向量机模型。在这个支持向量机模型中,10折交叉验证的平均准确率为0.9628±0.019。通过对2652个微生物进行预测,获得了22种潜在的聚磷菌。通过对预测的潜在聚磷菌进行分析,其中大部分可以从以前的报道中间接验证其除磷特性。我们构建的支持向量机模型显示出较高的预测准确率和良好的稳定性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aad2/7848102/8451deaa1914/fcell-08-626221-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验