State Key Laboratory of Pathogenesis, Prevention and Treatment of High Incidence Diseases in Central Asia, Department of Respiratory Medicine, First Affiliated Hospital of Xinjiang Medical University, Urumqi, 830000, Xinjiang, China.
Department of Respiratory Medicine, First Affiliated Hospital of Xinjiang Medical University, Urumqi, 830011, Xinjiang, China.
BMC Infect Dis. 2022 Aug 25;22(1):707. doi: 10.1186/s12879-022-07694-8.
Tuberculosis (TB) had been the leading lethal infectious disease worldwide for a long time (2014-2019) until the COVID-19 global pandemic, and it is still one of the top 10 death causes worldwide. One important reason why there are so many TB patients and death cases in the world is because of the difficulties in precise diagnosis of TB using common detection methods, especially for some smear-negative pulmonary tuberculosis (SNPT) cases. The rapid development of metabolome and machine learning offers a great opportunity for precision diagnosis of TB. However, the metabolite biomarkers for the precision diagnosis of smear-positive and smear-negative pulmonary tuberculosis (SPPT/SNPT) remain to be uncovered. In this study, we combined metabolomics and clinical indicators with machine learning to screen out newly diagnostic biomarkers for the precise identification of SPPT and SNPT patients.
Untargeted plasma metabolomic profiling was performed for 27 SPPT patients, 37 SNPT patients and controls. The orthogonal partial least squares-discriminant analysis (OPLS-DA) was then conducted to screen differential metabolites among the three groups. Metabolite enriched pathways, random forest (RF), support vector machines (SVM) and multilayer perceptron neural network (MLP) were performed using Metaboanalyst 5.0, "caret" R package, "e1071" R package and "Tensorflow" Python package, respectively.
Metabolomic analysis revealed significant enrichment of fatty acid and amino acid metabolites in the plasma of SPPT and SNPT patients, where SPPT samples showed a more serious dysfunction in fatty acid and amino acid metabolisms. Further RF analysis revealed four optimized diagnostic biomarker combinations including ten features (two lipid/lipid-like molecules and seven organic acids/derivatives, and one clinical indicator) for the identification of SPPT, SNPT patients and controls with high accuracy (83-93%), which were further verified by SVM and MLP. Among them, MLP displayed the best classification performance on simultaneously precise identification of the three groups (94.74%), suggesting the advantage of MLP over RF/SVM to some extent.
Our findings reveal plasma metabolomic characteristics of SPPT and SNPT patients, provide some novel promising diagnostic markers for precision diagnosis of various types of TB, and show the potential of machine learning in screening out biomarkers from big data.
在 COVID-19 全球大流行之前,结核病(TB)长期以来一直是全球主要致死性传染病(2014-2019 年),它仍然是全球十大死因之一。世界上有如此多的结核病患者和死亡病例的一个重要原因是,由于使用常规检测方法难以对结核病进行精确诊断,尤其是对于一些菌阴肺结核(SNPT)病例。代谢组学和机器学习的快速发展为结核病的精准诊断提供了巨大机会。然而,用于精确诊断菌阳和菌阴肺结核(SPPT/SNPT)的代谢物生物标志物仍有待发现。在这项研究中,我们结合代谢组学和临床指标与机器学习,筛选出用于精确识别 SPPT 和 SNPT 患者的新诊断生物标志物。
对 27 例菌阳肺结核患者、37 例菌阴肺结核患者和对照组进行非靶向血浆代谢组学分析。然后进行正交偏最小二乘判别分析(OPLS-DA)以筛选三组间的差异代谢物。使用 Metaboanalyst 5.0、“caret”R 包、“e1071”R 包和“Tensorflow”Python 包分别进行代谢物富集途径、随机森林(RF)、支持向量机(SVM)和多层感知机神经网络(MLP)分析。
代谢组学分析显示,菌阳和菌阴肺结核患者的血浆中脂肪酸和氨基酸代谢物明显富集,其中菌阳样本的脂肪酸和氨基酸代谢紊乱更为严重。进一步的 RF 分析显示,有四个优化的诊断生物标志物组合(包含十个特征,两个脂质/类脂分子和七个有机酸/衍生物,以及一个临床指标),可用于高准确度(83-93%)识别菌阳、菌阴肺结核患者和对照组,该结果还通过 SVM 和 MLP 得到了验证。其中,MLP 对三组同时进行精确识别的分类性能最佳(94.74%),这在一定程度上表明 MLP 优于 RF/SVM。
本研究揭示了菌阳和菌阴肺结核患者的血浆代谢组学特征,为精准诊断各种类型的结核病提供了一些新的有希望的诊断标志物,并展示了机器学习在从大数据中筛选生物标志物方面的潜力。