Myall Ashleigh C, Perkins Simon, Rushton David, David Jonathan, Spencer Phillippa, Jones Andrew R, Antczak Philipp
Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L697ZB, UK.
Department of Mathematics, Imperial College London, London SW7 2AZ, UK.
Bioinformatics. 2021 Aug 25;37(16):2347-2355. doi: 10.1093/bioinformatics/btab089.
A fundamental problem for disease treatment is that while antibiotics are a powerful counter to bacteria, they are ineffective against viruses. Often, bacterial and viral infections are confused due to their similar symptoms and lack of rapid diagnostics. With many clinicians relying primarily on symptoms for diagnosis, overuse and misuse of modern antibiotics are rife, contributing to the growing pool of antibiotic resistance. To ensure an individual receives optimal treatment given their disease state and to reduce over-prescription of antibiotics, the host response can in theory be measured quickly to distinguish between the two states. To establish a predictive biomarker panel of disease state (viral/bacterial/no-infection), we conducted a meta-analysis of human blood infection studies using machine learning.
We focused on publicly available gene expression data from two widely used platforms, Affymetrix and Illumina microarrays as they represented a significant proportion of the available data. We were able to develop multi-class models with high accuracies with our best model predicting 93% of bacterial and 89% viral samples correctly. To compare the selected features in each of the different technologies, we reverse-engineered the underlying molecular regulatory network and explored the neighbourhood of the selected features. The networks highlighted that although on the gene-level the models differed, they contained genes from the same areas of the network. Specifically, this convergence was to pathways including the Type I interferon Signalling Pathway, Chemotaxis, Apoptotic Processes and Inflammatory/Innate Response.
Data and code are available on the Gene Expression Omnibus and github.
Supplementary data are available at Bioinformatics online.
疾病治疗的一个基本问题是,虽然抗生素是对抗细菌的有力武器,但对病毒却无效。通常,由于细菌和病毒感染症状相似且缺乏快速诊断方法,二者常被混淆。许多临床医生主要依靠症状进行诊断,现代抗生素的过度使用和滥用十分普遍,导致抗生素耐药性问题日益严重。为了确保个体根据其疾病状态接受最佳治疗,并减少抗生素的过度处方,理论上可以快速测量宿主反应以区分这两种状态。为了建立一个疾病状态(病毒/细菌/无感染)的预测生物标志物面板,我们使用机器学习对人类血液感染研究进行了荟萃分析。
我们专注于来自两个广泛使用平台(Affymetrix和Illumina微阵列)的公开可用基因表达数据,因为它们占可用数据的很大比例。我们能够开发出具有高精度的多类模型,我们的最佳模型正确预测了93%的细菌样本和89%的病毒样本。为了比较每种不同技术中选择的特征,我们对潜在的分子调控网络进行了逆向工程,并探索了所选特征的邻域。这些网络突出表明,尽管在基因水平上模型有所不同,但它们包含来自网络相同区域的基因。具体而言,这种趋同涉及包括I型干扰素信号通路、趋化作用、凋亡过程和炎症/先天反应在内的途径。
数据和代码可在基因表达综合数据库(Gene Expression Omnibus)和github上获取。
补充数据可在《生物信息学》在线版获取。