用于多重结核病基因特征可重复性评估的基因集评分方法比较

Comparison of gene set scoring methods for reproducible evaluation of multiple tuberculosis gene signatures.

作者信息

Wang Xutao, VanValkenberg Arthur, Odom-Mabey Aubrey R, Ellner Jerrold J, Hochberg Natasha S, Salgame Padmini, Patil Prasad, Johnson W Evan

机构信息

Department of Biostatistics, Boston University, Boston, MA, USA.

Division of Computational Biomedicine and Bioinformatics Program, Boston University, Boston, MA, USA.

出版信息

bioRxiv. 2023 Jan 30:2023.01.19.520627. doi: 10.1101/2023.01.19.520627.

RATIONALE

Many blood-based transcriptional gene signatures for tuberculosis (TB) have been developed with potential use to diagnose disease, predict risk of progression from infection to disease, and monitor TB treatment outcomes. However, an unresolved issue is whether gene set enrichment analysis (GSEA) of the signature transcripts alone is sufficient for prediction and differentiation, or whether it is necessary to use the original statistical model created when the signature was derived. Intra-method comparison is complicated by the unavailability of original training data, missing details about the original trained model, and inadequate publicly-available software tools or source code implementing models. To facilitate these signatures' replicability and appropriate utilization in TB research, comprehensive comparisons between gene set scoring methods with cross-data validation of original model implementations are needed.

OBJECTIVES

We compared the performance of 19 TB gene signatures across 24 transcriptomic datasets using both re-rebuilt original models and gene set scoring methods to evaluate whether gene set scoring is a reasonable proxy to the performance of the original trained model. We have provided an open-access software implementation of the original models for all 19 signatures for future use.

METHODS

We considered existing gene set scoring and machine learning methods, including ssGSEA, GSVA, PLAGE, Singscore, and Zscore, as alternative approaches to profile gene signature performance. The sample-size-weighted mean area under the curve (AUC) value was computed to measure each signature's performance across datasets. Correlation analysis and Wilcoxon paired tests were used to analyze the performance of enrichment methods with the original models.

MEASUREMENT AND MAIN RESULTS

For many signatures, the predictions from gene set scoring methods were highly correlated and statistically equivalent to the results given by the original diagnostic models. PLAGE outperformed all other gene scoring methods. In some cases, PLAGE outperformed the original models when considering signatures' weighted mean AUC values and the AUC results within individual studies.

CONCLUSION

Gene set enrichment scoring of existing blood-based biomarker gene sets can distinguish patients with active TB disease from latent TB infection and other clinical conditions with equivalent or improved accuracy compared to the original methods and models. These data justify using gene set scoring methods of published TB gene signatures for predicting TB risk and treatment outcomes, especially when original models are difficult to apply or implement.

原理

许多用于结核病（TB）的基于血液的转录基因特征已被开发出来，具有诊断疾病、预测从感染到疾病进展的风险以及监测结核病治疗结果的潜在用途。然而，一个尚未解决的问题是，仅对特征转录本进行基因集富集分析（GSEA）是否足以进行预测和区分，还是有必要使用推导特征时创建的原始统计模型。由于原始训练数据不可用、原始训练模型的详细信息缺失以及实现模型的公开可用软件工具或源代码不足，方法内部的比较变得复杂。为了促进这些特征在结核病研究中的可重复性和适当利用，需要对基因集评分方法进行全面比较，并对原始模型实现进行交叉数据验证。

目的

我们使用重新构建的原始模型和基因集评分方法，比较了19种结核病基因特征在24个转录组数据集上的性能，以评估基因集评分是否是原始训练模型性能的合理替代指标。我们为所有19种特征提供了原始模型的开放获取软件实现，以供未来使用。

方法

我们考虑了现有的基因集评分和机器学习方法，包括单样本基因集富集分析（ssGSEA）、基因集变异分析（GSVA）、基于排列的基因集富集分析（PLAGE）、信号评分（Singscore）和Z评分，作为评估基因特征性能的替代方法。计算样本量加权平均曲线下面积（AUC）值，以衡量每个特征在各数据集上的性能。使用相关分析和Wilcoxon配对检验来分析富集方法与原始模型的性能。

测量与主要结果

对于许多特征，基因集评分方法的预测与原始诊断模型的结果高度相关且在统计学上等效。PLAGE的表现优于所有其他基因评分方法。在某些情况下，考虑特征的加权平均AUC值和各个研究中的AUC结果时，PLAGE的表现优于原始模型。

结论

与原始方法和模型相比，对现有的基于血液的生物标志物基因集进行基因集富集评分能够以相同或更高的准确性区分活动性结核病患者与潜伏性结核感染及其他临床情况。这些数据证明，使用已发表的结核病基因特征的基因集评分方法来预测结核病风险和治疗结果是合理的，特别是在原始模型难以应用或实施时。

Comparison of gene set scoring methods for reproducible evaluation of multiple tuberculosis gene signatures.

作者信息

机构信息

出版信息

RATIONALE

OBJECTIVES

METHODS

MEASUREMENT AND MAIN RESULTS

CONCLUSION

原理

目的

方法

测量与主要结果

结论

相似文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献