Department of Biomedical Informatics, Harvard Medical School, 25 Shattuck St, Boston, MA, 02115, USA.
Big Data Institute, Nuffield Department of Population Health, University of Oxford, Oxford, OX37LF, UK.
Nat Commun. 2022 Jul 2;13(1):3817. doi: 10.1038/s41467-022-31236-0.
Long diagnostic wait times hinder international efforts to address antibiotic resistance in M. tuberculosis. Pathogen whole genome sequencing, coupled with statistical and machine learning models, offers a promising solution. However, generalizability and clinical adoption have been limited by a lack of interpretability, especially in deep learning methods. Here, we present two deep convolutional neural networks that predict antibiotic resistance phenotypes of M. tuberculosis isolates: a multi-drug CNN (MD-CNN), that predicts resistance to 13 antibiotics based on 18 genomic loci, with AUCs 82.6-99.5% and higher sensitivity than state-of-the-art methods; and a set of 13 single-drug CNNs (SD-CNN) with AUCs 80.1-97.1% and higher specificity than the previous state-of-the-art. Using saliency methods to evaluate the contribution of input sequence features to the SD-CNN predictions, we identify 18 sites in the genome not previously associated with resistance. The CNN models permit functional variant discovery, biologically meaningful interpretation, and clinical applicability.
长的诊断等待时间阻碍了国际社会对抗结核分枝杆菌抗生素耐药性的努力。病原体全基因组测序,结合统计和机器学习模型,提供了一个有前途的解决方案。然而,由于缺乏可解释性,特别是在深度学习方法中,其通用性和临床应用受到了限制。在这里,我们提出了两种预测结核分枝杆菌分离株抗生素耐药表型的深度卷积神经网络:一种多药 CNN(MD-CNN),它基于 18 个基因组座预测对 13 种抗生素的耐药性,AUCs 为 82.6-99.5%,比最先进的方法具有更高的敏感性;以及一组 13 种单药 CNN(SD-CNN),AUCs 为 80.1-97.1%,特异性高于之前的最先进方法。使用显着性方法来评估输入序列特征对 SD-CNN 预测的贡献,我们确定了基因组中以前与耐药性无关的 18 个位点。CNN 模型允许功能变体发现、具有生物学意义的解释和临床适用性。