Shenzhen People's Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medicine College of Jinan University, Shenzhen 518020, China.
John Hopcroft Center for Computer Science, Shanghai Jiao Tong University, Shanghai, China.
Bioinformatics. 2023 Mar 1;39(3). doi: 10.1093/bioinformatics/btad109.
The confusion of acute inflammation infected by virus and bacteria or noninfectious inflammation will lead to missing the best therapy occasion resulting in poor prognoses. The diagnostic model based on host gene expression has been widely used to diagnose acute infections, but the clinical usage was hindered by the capability across different samples and cohorts due to the small sample size for signature training and discovery.
Here, we construct a large-scale dataset integrating multiple host transcriptomic data and analyze it using a sophisticated strategy which removes batch effect and extracts the common information from different cohorts based on the relative expression alteration of gene pairs. We assemble 2680 samples across 16 cohorts and separately build gene pair signature (GPS) for bacterial, viral, and noninfected patients. The three GPSs are further assembled into an antibiotic decision model (bacterial-viral-noninfected GPS, bvnGPS) using multiclass neural networks, which is able to determine whether a patient is bacterial infected, viral infected, or noninfected. bvnGPS can distinguish bacterial infection with area under the receiver operating characteristic curve (AUC) of 0.953 (95% confidence interval, 0.948-0.958) and viral infection with AUC of 0.956 (0.951-0.961) in the test set (N = 760). In the validation set (N = 147), bvnGPS also shows strong performance by attaining an AUC of 0.988 (0.978-0.998) on bacterial-versus-other and an AUC of 0.994 (0.984-1.000) on viral-versus-other. bvnGPS has the potential to be used in clinical practice and the proposed procedure provides insight into data integration, feature selection and multiclass classification for host transcriptomics data.
The codes implementing bvnGPS are available at https://github.com/Ritchiegit/bvnGPS. The construction of iPAGE algorithm and the training of neural network was conducted on Python 3.7 with Scikit-learn 0.24.1 and PyTorch 1.7. The visualization of the results was implemented on R 4.2, Python 3.7, and Matplotlib 3.3.4.
病毒和细菌引起的急性炎症与非传染性炎症之间的混淆,可能导致错过最佳治疗时机,从而导致预后不良。基于宿主基因表达的诊断模型已被广泛用于诊断急性感染,但由于签名训练和发现的样本量较小,该模型在不同样本和队列之间的应用能力受到限制。
在这里,我们构建了一个大型数据集,整合了多个宿主转录组数据,并使用一种复杂的策略对其进行分析,该策略可以消除批次效应,并基于基因对的相对表达变化,从不同队列中提取共同信息。我们汇集了 16 个队列中的 2680 个样本,并分别为细菌、病毒和非感染患者构建基因对特征 (GPS)。然后,使用多类神经网络将这三个 GPS 组装成一个抗生素决策模型(细菌-病毒-非感染 GPS,bvnGPS),该模型能够确定患者是细菌感染、病毒感染还是非感染。bvnGPS 可以区分细菌感染,测试集(N=760)的受试者工作特征曲线下面积(AUC)为 0.953(95%置信区间,0.948-0.958),病毒感染的 AUC 为 0.956(0.951-0.961)。在验证集(N=147)中,bvnGPS 也表现出很强的性能,在细菌与其他方面的 AUC 为 0.988(0.978-0.998),在病毒与其他方面的 AUC 为 0.994(0.984-1.000)。bvnGPS 具有在临床实践中应用的潜力,所提出的方法为宿主转录组数据的整合、特征选择和多类分类提供了思路。
bvnGPS 的代码可在 https://github.com/Ritchiegit/bvnGPS 上获得。iPAGE 算法的构建和神经网络的训练是在 Python 3.7 上使用 Scikit-learn 0.24.1 和 PyTorch 1.7 进行的。结果的可视化是在 R 4.2、Python 3.7 和 Matplotlib 3.3.4 上实现的。