Wang Xutao, Harper Katie, Sinha Pranay, Johnson W Evan, Patil Prasad
Department of Biostatistics, Boston University School of Public Health, Boston, USA.
Division of Computational Biomedicine, Boston University School of Medicine, Boston, USA.
Tuberculosis (Edinb). 2025 Jul;153:102649. doi: 10.1016/j.tube.2025.102649. Epub 2025 May 8.
Tuberculosis (TB) is the leading cause of infectious disease mortality worldwide. Numerous host blood-based gene expression signatures have been proposed in the literature as alternative tools for diagnosing TB infection. However, the generalizability of these signatures to different patient contexts is not well-characterized. There is a pressing need for a well-curated database of TB gene expression studies for the systematic assessment of existing and newly developed TB gene signatures.
We built curatedTBData, a manually-curated database of 49 human TB transcriptomic studies. This data resource is freely available through GitHub and as an R Bioconductor package that allows users to validate new and existing biomarkers without the challenges of harmonizing heterogeneous studies. We demonstrate the use of this data resource with cross-study comparisons for 72 human host blood-based TB gene signatures. For the comparison of subjects with active TB from healthy controls, 19 gene signatures had weighted mean AUC of 0.90 or greater, with the highest result of 0.94. In active TB disease versus latent TB infection, 7 gene signatures had weighted mean AUC of 0.90 or greater, with a maximum of 0.93.
The curatedTBData data package offers a comprehensive resource of curated human blood-based gene expression and clinically annotated data. This resource will facilitate the development of new signatures that are generalizable across cohorts or more applicable to specific subsets of patients.
结核病是全球传染病死亡的主要原因。文献中已提出许多基于宿主血液的基因表达特征作为诊断结核感染的替代工具。然而,这些特征在不同患者背景下的可推广性尚未得到充分表征。迫切需要一个精心策划的结核病基因表达研究数据库,用于系统评估现有的和新开发的结核病基因特征。
我们构建了curatedTBData,这是一个人工策划的包含49项人类结核病转录组研究的数据库。该数据资源可通过GitHub免费获取,并作为一个R Bioconductor包提供,用户可以使用它来验证新的和现有的生物标志物,而无需面对协调异质性研究的挑战。我们通过对72个基于人类宿主血液的结核病基因特征进行跨研究比较,展示了该数据资源的用途。在将活动性结核病患者与健康对照进行比较时,19个基因特征的加权平均AUC为0.90或更高,最高结果为0.94。在活动性结核病与潜伏性结核感染的比较中,7个基因特征的加权平均AUC为0.90或更高,最高为0.93。
curatedTBData数据包提供了一个精心策划的基于人类血液的基因表达和临床注释数据的综合资源。该资源将有助于开发可在不同队列中推广或更适用于特定患者亚组的新特征。