Department of Biostatistics, Boston University, Boston, MA, USA.
Division of Computational Biomedicine and Bioinformatics Program, Boston University, Boston, MA, USA.
BMC Infect Dis. 2024 Jun 20;24(1):610. doi: 10.1186/s12879-024-09457-z.
Blood-based transcriptional gene signatures for tuberculosis (TB) have been developed with potential use to diagnose disease. However, an unresolved issue is whether gene set enrichment analysis of the signature transcripts alone is sufficient for prediction and differentiation or whether it is necessary to use the original model created when the signature was derived. Intra-method comparison is complicated by the unavailability of original training data and missing details about the original trained model. To facilitate the utilization of these signatures in TB research, comparisons between gene set scoring methods cross-data validation of original model implementations are needed.
We compared the performance of 19 TB gene signatures across 24 transcriptomic datasets using both rrebuilt original models and gene set scoring methods. Existing gene set scoring methods, including ssGSEA, GSVA, PLAGE, Singscore, and Zscore, were used as alternative approaches to obtain the profile scores. The area under the ROC curve (AUC) value was computed to measure performance. Correlation analysis and Wilcoxon paired tests were used to compare the performance of enrichment methods with the original models.
For many signatures, the predictions from gene set scoring methods were highly correlated and statistically equivalent to the results given by the original models. In some cases, PLAGE outperformed the original models when considering signatures' weighted mean AUC values and the AUC results within individual studies.
Gene set enrichment scoring of existing gene sets can distinguish patients with active TB disease from other clinical conditions with equivalent or improved accuracy compared to the original methods and models. These data justify using gene set scoring methods of published TB gene signatures for predicting TB risk and treatment outcomes, especially when original models are difficult to apply or implement.
基于血液的转录基因特征已被开发用于结核病(TB)诊断,具有潜在的应用价值。然而,一个悬而未决的问题是,仅对特征转录本进行基因集富集分析是否足以进行预测和区分,还是有必要使用特征创建时使用的原始模型。由于无法获得原始训练数据以及原始训练模型的详细信息,因此内部方法比较变得复杂。为了促进这些特征在结核病研究中的应用,需要对跨数据的原始模型实现的基因集评分方法进行比较。
我们使用重新构建的原始模型和基因集评分方法比较了 19 个 TB 基因特征在 24 个转录组数据集上的性能。使用了现有的基因集评分方法,包括 ssGSEA、GSVA、PLAGE、Singscore 和 Zscore,作为获得特征评分的替代方法。计算了 ROC 曲线下的面积(AUC)值来衡量性能。使用相关分析和 Wilcoxon 配对检验比较了富集方法与原始模型的性能。
对于许多特征,基因集评分方法的预测结果与原始模型高度相关且在统计学上等效。在某些情况下,当考虑特征的加权平均 AUC 值和个体研究中的 AUC 结果时,PLAGE 优于原始模型。
与原始方法和模型相比,现有基因集的基因集富集评分可以以相当或更高的准确性区分活动性结核病患者与其他临床情况。这些数据证明,使用已发表的 TB 基因特征的基因集评分方法来预测 TB 风险和治疗结果是合理的,尤其是在原始模型难以应用或实施的情况下。