Veterinary Education, Research, and Outreach Center, Texas A&M University and West Texas A&M University, Canyon, TX, USA.
Department of Pathobiology and Population Medicine, Mississippi State University, Mississippi State, MS, USA.
Sci Rep. 2021 Nov 25;11(1):22916. doi: 10.1038/s41598-021-02343-7.
Bovine respiratory disease (BRD) is a multifactorial disease involving complex host immune interactions shaped by pathogenic agents and environmental factors. Advancements in RNA sequencing and associated analytical methods are improving our understanding of host response related to BRD pathophysiology. Supervised machine learning (ML) approaches present one such method for analyzing new and previously published transcriptome data to identify novel disease-associated genes and mechanisms. Our objective was to apply ML models to lung and immunological tissue datasets acquired from previous clinical BRD experiments to identify genes that classify disease with high accuracy. Raw mRNA sequencing reads from 151 bovine datasets (n = 123 BRD, n = 28 control) were downloaded from NCBI-GEO. Quality filtered reads were assembled in a HISAT2/Stringtie2 pipeline. Raw gene counts for ML analysis were normalized, transformed, and analyzed with MLSeq, utilizing six ML models. Cross-validation parameters (fivefold, repeated 10 times) were applied to 70% of the compiled datasets for ML model training and parameter tuning; optimized ML models were tested with the remaining 30%. Downstream analysis of significant genes identified by the top ML models, based on classification accuracy for each etiological association, was performed within WebGestalt and Reactome (FDR ≤ 0.05). Nearest shrunken centroid and Poisson linear discriminant analysis with power transformation models identified 154 and 195 significant genes for IBR and BRSV, respectively; from these genes, the two ML models discriminated IBR and BRSV with 100% accuracy compared to sham controls. Significant genes classified by the top ML models in IBR (154) and BRSV (195), but not BVDV (74), were related to type I interferon production and IL-8 secretion, specifically in lymphoid tissue and not homogenized lung tissue. Genes identified in Mannheimia haemolytica infections (97) were involved in activating classical and alternative pathways of complement. Novel findings, including expression of genes related to reduced mitochondrial oxygenation and ATP synthesis in consolidated lung tissue, were discovered. Genes identified in each analysis represent distinct genomic events relevant to understanding and predicting clinical BRD. Our analysis demonstrates the utility of ML with published datasets for discovering functional information to support the prediction and understanding of clinical BRD.
牛呼吸道疾病 (BRD) 是一种多因素疾病,涉及复杂的宿主免疫相互作用,这些相互作用由病原体和环境因素塑造。RNA 测序和相关分析方法的进步正在提高我们对 BRD 病理生理学相关宿主反应的理解。有监督的机器学习 (ML) 方法是分析新的和以前发表的转录组数据以识别新的疾病相关基因和机制的方法之一。我们的目标是将 ML 模型应用于以前的临床 BRD 实验中获得的肺和免疫组织数据集,以识别能够高度准确地对疾病进行分类的基因。从 NCBI-GEO 下载了 151 个牛数据集(n=123 个 BRD,n=28 个对照)的原始 mRNA 测序reads。使用 HISAT2/Stringtie2 流水线对经过质量过滤的reads 进行组装。用于 ML 分析的原始基因计数经过标准化、转换,并使用 MLSeq 进行分析,该软件利用了 6 个 ML 模型。将编译数据集的 70%应用于交叉验证参数(五重交叉验证,重复 10 次),用于 ML 模型训练和参数调整;用剩余的 30%测试优化的 ML 模型。根据每个病因关联的分类准确性,在 WebGestalt 和 Reactome 中对由顶级 ML 模型识别的显著基因进行下游分析(FDR≤0.05)。最近的收缩质心和泊松线性判别分析与幂变换模型分别为 IBR 和 BRSV 识别了 154 个和 195 个显著基因;与 sham 对照相比,这两种 ML 模型用 100%的准确率对 IBR 和 BRSV 进行了分类。在 IBR(154 个)和 BRSV(195 个)中,顶级 ML 模型分类的显著基因,但在 BVDV(74 个)中没有,与 I 型干扰素的产生和 IL-8 分泌有关,特别是在淋巴组织中,而不是匀浆肺组织中。在 Mannheimia haemolytica 感染中发现的基因(97 个)涉及经典和替代补体途径的激活。在实变的肺组织中发现了与减少线粒体氧合和 ATP 合成有关的基因等新发现。在每种分析中发现的基因都代表与理解和预测临床 BRD 相关的独特基因组事件。我们的分析证明了使用发表的数据集进行 ML 的实用性,可用于发现功能信息,以支持对临床 BRD 的预测和理解。