Department of Disinfection and Infection Control, Chinese People's Liberation Army (PLA) Center for Disease Control and Prevention, Beijing, China.
School of Mathematics and Statistics, Shaanxi Normal University, Xi'an, China.
Front Cell Infect Microbiol. 2021 Oct 15;11:742062. doi: 10.3389/fcimb.2021.742062. eCollection 2021.
is an emerging fungus associated with high morbidity. It has a unique transmission ability and is often resistant to multiple drugs. In this study, we evaluated the ability of different machine learning models to classify the drug resistance and predicted and ranked the drug resistance mutations of . Two strains were obtained. Combined with other 356 strains collected from the European Bioinformatics Institute (EBI) databases, the whole genome sequencing (WGS) data were analyzed by bioinformatics. Machine learning classifiers were used to build drug resistance models, which were evaluated and compared by various evaluation methods based on AUC value. Briefly, two strains were assigned to Clade III in the phylogenetic tree, which was consistent with previous studies; nevertheless, the phylogenetic tree was not completely consistent with the conclusion of clustering according to the geographical location discovered earlier. The clustering results of were related to its drug resistance. The resistance genes of were not under additional strong selection pressure, and the performance of different models varied greatly for different drugs. For drugs such as azoles and echinocandins, the models performed relatively well. In addition, two machine learning algorithms, based on the balanced test and imbalanced test, were designed and evaluated; for most drugs, the evaluation results on the balanced test set were better than on the imbalanced test set. The mutations strongly be associated with drug resistance of were predicted and ranked by Recursive Feature Elimination with Cross-Validation (RFECV) combined with a machine learning classifier. In addition to known drug resistance mutations, some new resistance mutations were predicted, such as Y501H and I466M mutation in the gene and R278H mutation in the gene, which may be associated with fluconazole (FCZ), micafungin (MCF), and amphotericin B (AmB) resistance, respectively; these mutations were in the "hot spot" regions of the ergosterol pathway. To sum up, this study suggested that machine learning classifiers are a useful and cost-effective method to identify fungal drug resistance-related mutations, which is of great significance for the research on the resistance mechanism of .
是一种新兴的真菌,与高发病率有关。它具有独特的传播能力,并且经常对多种药物产生耐药性。在这项研究中,我们评估了不同机器学习模型对耐药性进行分类的能力,并预测和对 的耐药性突变进行排序。获得了两个 株。结合从欧洲生物信息学研究所 (EBI) 数据库收集的其他 356 个菌株,通过生物信息学分析全基因组测序 (WGS) 数据。使用机器学习分类器构建耐药性模型,根据 AUC 值通过各种评估方法对其进行评估和比较。简而言之,两个菌株在系统发育树中被分配到 III 进化枝,这与之前的研究一致;然而,系统发育树与根据先前发现的地理位置进行聚类的结论并不完全一致。 的聚类结果与其耐药性有关。 的耐药基因不受额外的强选择压力的影响,不同模型对不同药物的性能差异很大。对于唑类和棘白菌素类药物等,模型的表现相对较好。此外,还设计和评估了两种基于平衡测试和不平衡测试的机器学习算法;对于大多数药物,在平衡测试集上的评估结果优于在不平衡测试集上的评估结果。通过基于交叉验证的递归特征消除 (RFECV) 与机器学习分类器相结合,预测和对 的耐药性突变进行排序。除了已知的耐药突变外,还预测了一些新的耐药突变,如 基因中的 Y501H 和 I466M 突变以及 基因中的 R278H 突变,它们可能分别与氟康唑 (FCZ)、米卡芬净 (MCF) 和两性霉素 B (AmB) 耐药性相关;这些突变位于麦角固醇途径的“热点”区域。总之,这项研究表明,机器学习分类器是一种识别真菌耐药相关突变的有用且具有成本效益的方法,对 耐药机制的研究具有重要意义。