Department of Mathematics, Simon Fraser University, Burnaby, Canada.
British Columbia Centre for Disease Control, Vancouver, Canada.
Sci Rep. 2024 Nov 12;14(1):27652. doi: 10.1038/s41598-024-78247-z.
Identifying individuals with tuberculosis (TB) with a high risk of onward transmission can guide disease prevention and public health strategies. Here, we train classification models to predict the first sampled isolates in Mycobacterium tuberculosis transmission clusters from demographic and disease data. We find that supervised learning, in particular balanced random forests, can be used to develop predictive models to identify people with TB that are more likely associated with TB cluster growth, with good model performance and AUCs of ≥ 0.75. We also identified the most important patient and disease characteristics in the best performing classification model, including host demographics, site of infection, TB lineage, and age at diagnosis. This framework can be used to develop predictive tools for the early assessment of potential cluster growth to prioritise individuals for enhanced follow-up with the aim of reducing transmission chains.
识别具有较高传播风险的结核病(TB)个体可以指导疾病预防和公共卫生策略。在这里,我们训练分类模型,以根据人口统计学和疾病数据预测分枝杆菌结核传播群中的首个采样分离株。我们发现,监督学习,特别是平衡随机森林,可以用于开发预测模型,以识别更有可能与结核病集群增长相关的结核病患者,这些模型具有良好的性能和 AUC 值≥0.75。我们还确定了性能最佳的分类模型中最重要的患者和疾病特征,包括宿主人口统计学、感染部位、结核谱系和诊断时的年龄。该框架可用于开发预测工具,以早期评估潜在的集群增长,从而优先考虑需要加强随访的个体,以减少传播链。