Department of Environmental Science and Engineering, Xi'an Jiaotong University, Xi'an 710049, China; Department of Biological and Ecological Engineering, Oregon State University, Corvallis OR 97331, USA.
Department of Biological and Ecological Engineering, Oregon State University, Corvallis OR 97331, USA.
Biosens Bioelectron. 2019 May 15;133:64-71. doi: 10.1016/j.bios.2019.03.021. Epub 2019 Mar 13.
The complicated interactions that occur in mixed-species biotechnologies, including biosensors, hinder chemical detection specificity. This lack of specificity limits applications in which biosensors may be deployed, such as those where an unknown feed substrate must be determined. The application of genomic data and well-developed data mining technologies can overcome these limitations and advance engineering development. In the present study, 69 samples with three different substrate types (acetate, carbohydrates and wastewater) collected from various laboratory environments were evaluated to determine the ability to identify feed substrates from the resultant microbial communities. Six machine learning algorithms with four different input variables were trained and evaluated on their ability to predict feed substrate from genomic datasets. The highest accuracies of 93 ± 6% and 92 ± 5% were obtained using NNET trained on datasets classified at the phylum and family taxonomic level, respectively. These accuracies corresponded to kappa values of 0.87 ± 0.10, 0.86 ± 0.09, respectively. Four out of six of the algorithms used maintained accuracies above 80% and kappa values higher than 0.66. Different sequencing method (Roche 454 or Illumina sequencing) did not affect the accuracies of all algorithms, except SVM at the phylum level. All algorithms trained on NMDS-compressed datasets obtained accuracies over 80%, while models trained on PCoA-compressed datasets presented a 10-30% reduction in accuracy. These results suggest that incorporating microbial community data with machine learning algorithms can be used for the prediction of feed substrate and for the potential improvement of MFC-based biosensor signal specificity, providing a new use of machine learning techniques that has substantial practical applications in biotechnological fields.
混合物种生物技术(包括生物传感器)中发生的复杂相互作用会阻碍化学检测的特异性。这种特异性的缺乏限制了生物传感器的应用,例如在必须确定未知饲料基质的情况下。基因组数据和成熟的数据挖掘技术的应用可以克服这些限制并推进工程开发。在本研究中,评估了来自不同实验室环境的 69 个具有三种不同基质类型(乙酸盐、碳水化合物和废水)的样本,以确定从所得微生物群落中识别饲料基质的能力。使用来自不同实验室环境的 69 个具有三种不同基质类型(乙酸盐、碳水化合物和废水)的样本,评估了六种具有四种不同输入变量的机器学习算法在预测基因组数据集饲料基质能力方面的表现。使用 NNET 对分类到门和科分类水平的数据集进行训练,得到了 93±6%和 92±5%的最高精度。这些精度对应于 0.87±0.10 和 0.86±0.09 的 kappa 值。六种算法中的四种算法的精度保持在 80%以上,kappa 值高于 0.66。除了 SVM 在门水平外,不同的测序方法(罗氏 454 或 Illumina 测序)没有影响所有算法的精度。所有基于 NMDS 压缩数据集训练的算法的精度都超过 80%,而基于 PCoA 压缩数据集训练的模型的精度降低了 10-30%。这些结果表明,将微生物群落数据与机器学习算法结合使用可以用于预测饲料基质,并有可能提高基于 MFC 的生物传感器信号的特异性,为生物技术领域的机器学习技术提供了新的应用。