Center of Growth, Metabolism and Aging, Key Laboratory of Bio-Resources and Eco-Environment of the Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610064, China.
Key Laboratory of Bio-Resources and Eco-Environment of the Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610064, China.
BMC Genomics. 2024 Feb 12;25(1):167. doi: 10.1186/s12864-024-10066-y.
The most widely practiced strategy for constructing the deep learning (DL) prediction model for drug resistance of Mycobacterium tuberculosis (MTB) involves the adoption of ready-made and state-of-the-art architectures usually proposed for non-biological problems. However, the ultimate goal is to construct a customized model for predicting the drug resistance of MTB and eventually for the biological phenotypes based on genotypes. Here, we constructed a DL training framework to standardize and modularize each step during the training process using the latest tensorflow 2 API. A systematic and comprehensive evaluation of each module in the three currently representative models, including Convolutional Neural Network, Denoising Autoencoder, and Wide & Deep, which were adopted by CNNGWP, DeepAMR, and WDNN, respectively, was performed in this framework regarding module contributions in order to assemble a novel model with proper dedicated modules. Based on the whole-genome level mutations, a de novo learning method was developed to overcome the intrinsic limitations of previous models that rely on known drug resistance-associated loci. A customized DL model with the multilayer perceptron architecture was constructed and achieved a competitive performance (the mean sensitivity and specificity were 0.90 and 0.87, respectively) compared to previous ones. The new model developed was applied in an end-to-end user-friendly graphical tool named TB-DROP (TuBerculosis Drug Resistance Optimal Prediction: https://github.com/nottwy/TB-DROP ), in which users only provide sequencing data and TB-DROP will complete analysis within several minutes for one sample. Our study contributes to both a new strategy of model construction and clinical application of deep learning-based drug-resistance prediction methods.
构建结核分枝杆菌(MTB)耐药性深度学习(DL)预测模型最广泛采用的策略是采用为非生物问题通常提出的现成的最先进的架构。然而,最终目标是构建一个定制的 MTB 耐药性预测模型,并最终基于基因型构建生物表型。在这里,我们构建了一个 DL 训练框架,使用最新的 tensorflow 2 API 对训练过程中的每个步骤进行标准化和模块化。在这个框架中,对三个当前代表性模型(包括卷积神经网络、去噪自动编码器和宽深网络)中的每个模块进行了系统和全面的评估,以确定其在组装合适的专用模块方面的模块贡献。基于全基因组水平的突变,开发了一种从头开始的学习方法,以克服以前的模型依赖于已知耐药相关基因座的固有局限性。构建了一个具有多层感知器架构的定制 DL 模型,与之前的模型相比,其性能具有竞争力(平均灵敏度和特异性分别为 0.90 和 0.87)。所开发的新模型已应用于一个端到端用户友好的图形工具 TB-DROP(TuBerculosis Drug Resistance Optimal Prediction:https://github.com/nottwy/TB-DROP)中,用户只需提供测序数据,TB-DROP 将在几分钟内完成一个样本的分析。我们的研究为基于深度学习的耐药性预测方法的模型构建和临床应用提供了新的策略。