Orcales Faye, Moctezuma Tan Lucy, Johnson-Hagler Meris, Suntay John Matthew, Ali Jameel, Recto Kristiene, Glenn Phelan, Pennings Pleuni
Department of Biology, San Francisco State University, San Francisco, California, United States of America.
University of California San Francisco, San Francisco, California, United States of America.
PLoS Comput Biol. 2024 Dec 30;20(12):e1012579. doi: 10.1371/journal.pcbi.1012579. eCollection 2024 Dec.
Antibiotic resistance is a global public health concern. Bacteria have evolved resistance to most antibiotics, which means that for any given bacterial infection, the bacteria may be resistant to one or several antibiotics. It has been suggested that genomic sequencing and machine learning (ML) could make resistance testing more accurate and cost-effective. Given that ML is likely to become an ever more important tool in medicine, we believe that it is important for pre-health students and others in the life sciences to learn to use ML tools. This paper provides a step-by-step tutorial to train 4 different ML models (logistic regression, random forests, extreme gradient-boosted trees, and neural networks) to predict drug resistance for Escherichia coli isolates and to evaluate their performance using different metrics and cross-validation techniques. We also guide the user in how to load and prepare the data used for the ML models. The tutorial is accessible to beginners and does not require any software to be installed as it is based on Google Colab notebooks and provides a basic understanding of the different ML models. The tutorial can be used in undergraduate and graduate classes for students in Biology, Public Health, Computer Science, or related fields.
抗生素耐药性是一个全球公共卫生问题。细菌已经对大多数抗生素产生了耐药性,这意味着对于任何特定的细菌感染,细菌可能对一种或几种抗生素耐药。有人提出,基因组测序和机器学习(ML)可以使耐药性检测更准确且更具成本效益。鉴于ML可能会成为医学中越来越重要的工具,我们认为对于健康预科学生和生命科学领域的其他人来说,学习使用ML工具很重要。本文提供了一个循序渐进的教程,用于训练4种不同的ML模型(逻辑回归、随机森林、极端梯度提升树和神经网络)来预测大肠杆菌分离株的耐药性,并使用不同的指标和交叉验证技术评估它们的性能。我们还指导用户如何加载和准备用于ML模型的数据。初学者可以使用该教程,并且不需要安装任何软件,因为它基于谷歌Colab笔记本,并且提供了对不同ML模型的基本理解。该教程可用于生物学、公共卫生、计算机科学或相关领域的本科和研究生课程。