• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

bvnGPS:一种基于整合宿主转录组学和预训练神经网络的急性细菌和病毒感染通用诊断模型。

bvnGPS: a generalizable diagnostic model for acute bacterial and viral infection using integrative host transcriptomics and pretrained neural networks.

机构信息

Shenzhen People's Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medicine College of Jinan University, Shenzhen 518020, China.

John Hopcroft Center for Computer Science, Shanghai Jiao Tong University, Shanghai, China.

出版信息

Bioinformatics. 2023 Mar 1;39(3). doi: 10.1093/bioinformatics/btad109.

DOI:10.1093/bioinformatics/btad109
PMID:36857587
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9997702/
Abstract

MOTIVATION

The confusion of acute inflammation infected by virus and bacteria or noninfectious inflammation will lead to missing the best therapy occasion resulting in poor prognoses. The diagnostic model based on host gene expression has been widely used to diagnose acute infections, but the clinical usage was hindered by the capability across different samples and cohorts due to the small sample size for signature training and discovery.

RESULTS

Here, we construct a large-scale dataset integrating multiple host transcriptomic data and analyze it using a sophisticated strategy which removes batch effect and extracts the common information from different cohorts based on the relative expression alteration of gene pairs. We assemble 2680 samples across 16 cohorts and separately build gene pair signature (GPS) for bacterial, viral, and noninfected patients. The three GPSs are further assembled into an antibiotic decision model (bacterial-viral-noninfected GPS, bvnGPS) using multiclass neural networks, which is able to determine whether a patient is bacterial infected, viral infected, or noninfected. bvnGPS can distinguish bacterial infection with area under the receiver operating characteristic curve (AUC) of 0.953 (95% confidence interval, 0.948-0.958) and viral infection with AUC of 0.956 (0.951-0.961) in the test set (N = 760). In the validation set (N = 147), bvnGPS also shows strong performance by attaining an AUC of 0.988 (0.978-0.998) on bacterial-versus-other and an AUC of 0.994 (0.984-1.000) on viral-versus-other. bvnGPS has the potential to be used in clinical practice and the proposed procedure provides insight into data integration, feature selection and multiclass classification for host transcriptomics data.

AVAILABILITY AND IMPLEMENTATION

The codes implementing bvnGPS are available at https://github.com/Ritchiegit/bvnGPS. The construction of iPAGE algorithm and the training of neural network was conducted on Python 3.7 with Scikit-learn 0.24.1 and PyTorch 1.7. The visualization of the results was implemented on R 4.2, Python 3.7, and Matplotlib 3.3.4.

摘要

动机

病毒和细菌引起的急性炎症与非传染性炎症之间的混淆,可能导致错过最佳治疗时机,从而导致预后不良。基于宿主基因表达的诊断模型已被广泛用于诊断急性感染,但由于签名训练和发现的样本量较小,该模型在不同样本和队列之间的应用能力受到限制。

结果

在这里,我们构建了一个大型数据集,整合了多个宿主转录组数据,并使用一种复杂的策略对其进行分析,该策略可以消除批次效应,并基于基因对的相对表达变化,从不同队列中提取共同信息。我们汇集了 16 个队列中的 2680 个样本,并分别为细菌、病毒和非感染患者构建基因对特征 (GPS)。然后,使用多类神经网络将这三个 GPS 组装成一个抗生素决策模型(细菌-病毒-非感染 GPS,bvnGPS),该模型能够确定患者是细菌感染、病毒感染还是非感染。bvnGPS 可以区分细菌感染,测试集(N=760)的受试者工作特征曲线下面积(AUC)为 0.953(95%置信区间,0.948-0.958),病毒感染的 AUC 为 0.956(0.951-0.961)。在验证集(N=147)中,bvnGPS 也表现出很强的性能,在细菌与其他方面的 AUC 为 0.988(0.978-0.998),在病毒与其他方面的 AUC 为 0.994(0.984-1.000)。bvnGPS 具有在临床实践中应用的潜力,所提出的方法为宿主转录组数据的整合、特征选择和多类分类提供了思路。

可用性和实现

bvnGPS 的代码可在 https://github.com/Ritchiegit/bvnGPS 上获得。iPAGE 算法的构建和神经网络的训练是在 Python 3.7 上使用 Scikit-learn 0.24.1 和 PyTorch 1.7 进行的。结果的可视化是在 R 4.2、Python 3.7 和 Matplotlib 3.3.4 上实现的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/089d/9997702/ccaf5101274f/btad109f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/089d/9997702/6d58acd224c7/btad109f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/089d/9997702/dc201ab6e32d/btad109f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/089d/9997702/5f4f5db294af/btad109f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/089d/9997702/d9f392a3d224/btad109f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/089d/9997702/ccaf5101274f/btad109f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/089d/9997702/6d58acd224c7/btad109f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/089d/9997702/dc201ab6e32d/btad109f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/089d/9997702/5f4f5db294af/btad109f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/089d/9997702/d9f392a3d224/btad109f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/089d/9997702/ccaf5101274f/btad109f5.jpg

相似文献

1
bvnGPS: a generalizable diagnostic model for acute bacterial and viral infection using integrative host transcriptomics and pretrained neural networks.bvnGPS:一种基于整合宿主转录组学和预训练神经网络的急性细菌和病毒感染通用诊断模型。
Bioinformatics. 2023 Mar 1;39(3). doi: 10.1093/bioinformatics/btad109.
2
Integration and validation of host transcript signatures, including a novel 3-transcript tuberculosis signature, to enable one-step multiclass diagnosis of childhood febrile disease.整合和验证宿主转录本特征,包括一种新的三转录本结核特征,以实现儿童发热性疾病的一步多类诊断。
J Transl Med. 2024 Aug 29;22(1):802. doi: 10.1186/s12967-024-05241-4.
3
Systematic comparison of published host gene expression signatures for bacterial/viral discrimination.发表的宿主基因表达谱在细菌/病毒鉴别中的系统比较。
Genome Med. 2022 Feb 21;14(1):18. doi: 10.1186/s13073-022-01025-x.
4
Identification of a Minimal 3-Transcript Signature to Differentiate Viral from Bacterial Infection from Best Genome-Wide Host RNA Biomarkers: A Multi-Cohort Analysis.从最佳全基因组宿主 RNA 生物标志物中区分病毒与细菌感染的最小 3 转录本特征的鉴定:一项多队列分析。
Int J Mol Sci. 2021 Mar 19;22(6):3148. doi: 10.3390/ijms22063148.
5
A two-transcript biomarker of host classifier genes for discrimination of bacterial from viral infection in acute febrile illness: a multicentre discovery and validation study.用于鉴别急性发热性疾病中细菌与病毒感染的宿主分类器基因的双转录本生物标志物:一项多中心发现和验证研究。
Lancet Digit Health. 2021 Aug;3(8):e507-e516. doi: 10.1016/S2589-7500(21)00102-3.
6
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
7
A generalizable 29-mRNA neural-network classifier for acute bacterial and viral infections.一种可推广的 29-mRNA 神经网络分类器,用于急性细菌和病毒感染。
Nat Commun. 2020 Mar 4;11(1):1177. doi: 10.1038/s41467-020-14975-w.
8
Discriminating Bacterial and Viral Infection Using a Rapid Host Gene Expression Test.使用快速宿主基因表达检测区分细菌和病毒感染。
Crit Care Med. 2021 Oct 1;49(10):1651-1663. doi: 10.1097/CCM.0000000000005085.
9
Discovery and validation of a three-gene signature to distinguish COVID-19 and other viral infections in emergency infectious disease presentations: a case-control and observational cohort study.发现并验证一个三基因标志物,用于区分急诊传染病中的 COVID-19 和其他病毒感染:一项病例对照和观察性队列研究。
Lancet Microbe. 2021 Nov;2(11):e594-e603. doi: 10.1016/S2666-5247(21)00145-2. Epub 2021 Aug 16.
10
A robust host-response-based signature distinguishes bacterial and viral infections across diverse global populations.一个稳健的基于宿主反应的特征签名可区分不同全球人群中的细菌和病毒感染。
Cell Rep Med. 2022 Dec 20;3(12):100842. doi: 10.1016/j.xcrm.2022.100842.

引用本文的文献

1
Spatial transcriptomics identifies novel Pseudomonas aeruginosa virulence factors.空间转录组学鉴定出新型铜绿假单胞菌毒力因子。
Cell Genom. 2025 Mar 12;5(3):100805. doi: 10.1016/j.xgen.2025.100805.
2
Accurate identification of medulloblastoma subtypes from diverse data sources with severe batch effects by RaMBat.通过RaMBat从具有严重批次效应的各种数据源中准确识别髓母细胞瘤亚型。
bioRxiv. 2025 May 5:2025.02.24.640010. doi: 10.1101/2025.02.24.640010.
3
scMMAE: masked cross-attention network for single-cell multimodal omics fusion to enhance unimodal omics.

本文引用的文献

1
GPGPS: a robust prognostic gene pair signature of glioma ensembling IDH mutation and 1p/19q co-deletion.GPGPS:一种稳健的胶质母细胞瘤预后基因对特征,综合了 IDH 突变和 1p/19q 共缺失。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac850.
2
Whole transcriptome analysis reveals non-coding RNA's competing endogenous gene pairs as novel form of motifs in serous ovarian cancer.全转录组分析揭示非编码 RNA 的竞争内源性基因对作为浆液性卵巢癌中新型基序形式。
Comput Biol Med. 2022 Sep;148:105881. doi: 10.1016/j.compbiomed.2022.105881. Epub 2022 Jul 20.
3
ConSIG: consistent discovery of molecular signature from OMIC data.
scMMAE:用于单细胞多组学融合以增强单一组学的掩码交叉注意力网络。
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf010.
4
iDICss robustly predicts melanoma immunotherapy response by synergizing genomic and transcriptomic knowledge via independent component analysis.iDICss通过独立成分分析整合基因组和转录组知识,有力地预测黑色素瘤免疫治疗反应。
Clin Transl Med. 2025 Jan;15(1):e70183. doi: 10.1002/ctm2.70183.
5
PAGE-based transfer learning from single-cell to bulk sequencing enhances model generalization for sepsis diagnosis.基于PAGE的从单细胞测序到批量测序的迁移学习增强了脓毒症诊断模型的泛化能力。
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae661.
6
Less is more: relative rank is more informative than absolute abundance for compositional NGS data.少即是多:对于组成性NGS数据,相对排名比绝对丰度更具信息量。
Brief Funct Genomics. 2025 Jan 15;24. doi: 10.1093/bfgp/elae045.
7
Pairwise analysis of gene expression for oral squamous cell carcinoma via a large-scale transcriptome integration.通过大规模转录组整合对口腔鳞状细胞癌的基因表达进行成对分析。
J Cell Mol Med. 2024 Oct;28(20):e70153. doi: 10.1111/jcmm.70153.
8
scCaT: An explainable capsulating architecture for sepsis diagnosis transferring from single-cell RNA sequencing.scCaT:一种用于从单细胞 RNA 测序转移的脓毒症诊断的可解释的封装架构。
PLoS Comput Biol. 2024 Oct 21;20(10):e1012083. doi: 10.1371/journal.pcbi.1012083. eCollection 2024 Oct.
9
Deep learning model to discriminate diverse infection types based on pairwise analysis of host gene expression.基于宿主基因表达成对分析的深度学习模型,用于区分不同感染类型。
iScience. 2024 May 7;27(6):109908. doi: 10.1016/j.isci.2024.109908. eCollection 2024 Jun 21.
10
Diagnostic Prediction of portal vein thrombosis in chronic cirrhosis patients using data-driven precision medicine model.基于数据驱动的精准医学模型对慢性肝硬化患者门静脉血栓形成的诊断预测。
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad478.
ConSIG:从 OMIC 数据中一致发现分子特征。
Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac253.
4
meGPS: a multi-omics signature for hepatocellular carcinoma detection integrating methylome and transcriptome data.meGPS:一种整合甲基化组和转录组数据的用于肝细胞癌检测的多组学生物标志物。
Bioinformatics. 2022 Jul 11;38(14):3513-3522. doi: 10.1093/bioinformatics/btac379.
5
Improving bulk RNA-seq classification by transferring gene signature from single cells in acute myeloid leukemia.通过从急性髓系白血病的单细胞中转录基因特征提高批量 RNA-seq 分类。
Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbac002.
6
Optimization of metabolomic data processing using NOREVA.使用 NOREVA 优化代谢组学数据处理。
Nat Protoc. 2022 Jan;17(1):129-151. doi: 10.1038/s41596-021-00636-9. Epub 2021 Dec 24.
7
Long non-coding RNA pairs to assist in diagnosing sepsis.长非编码 RNA 对协助诊断脓毒症。
BMC Genomics. 2021 Apr 16;22(1):275. doi: 10.1186/s12864-021-07576-4.
8
The Gene Ontology resource: enriching a GOld mine.基因本体论资源:丰富一个 GOld 矿。
Nucleic Acids Res. 2021 Jan 8;49(D1):D325-D334. doi: 10.1093/nar/gkaa1113.
9
KEGG: integrating viruses and cellular organisms.KEGG:整合病毒和细胞生物。
Nucleic Acids Res. 2021 Jan 8;49(D1):D545-D551. doi: 10.1093/nar/gkaa970.
10
A long non-coding RNA signature for diagnostic prediction of sepsis upon ICU admission.一种用于预测重症监护病房入院时脓毒症诊断的长链非编码RNA特征。
Clin Transl Med. 2020 Jul;10(3):e123. doi: 10.1002/ctm2.123. Epub 2020 Jul 2.