Massachusetts General Hospital Institute of Health Professions, Boston, MA, USA.
Department of Pediatrics, Yale University School of Medicine, New Haven, CT, USA.
BMC Med Genomics. 2021 Dec 1;14(1):285. doi: 10.1186/s12920-021-01136-1.
We previously identified differentially expressed genes on the basis of false discovery rate adjusted P value using empirical Bayes moderated tests. However, that approach yielded a subset of differentially expressed genes without accounting for redundancy between the selected genes.
This study is a secondary analysis of a case-control study of the effect of antiretroviral therapy on apoptosis pathway genes comprising of 16 cases (HIV infected with mitochondrial toxicity) and 16 controls (uninfected). We applied the maximum relevance minimum redundancy (mRMR) algorithm on the genes that were differentially expressed between the cases and controls. The mRMR algorithm iteratively selects features (genes) that are maximally relevant for class prediction and minimally redundant. We implemented several machine learning classifiers and tested the prediction accuracy of the two mRMR genes. We next used network analysis to estimate and visualize the association among the differentially expressed genes. We employed Markov Random Field or undirected network models to identify gene networks related to mitochondrial toxicity. The Spinglass model was used to identify clusters of gene communities.
The mRMR algorithm ranked DFFA and TNFRSF1A, two of the upregulated proapoptotic genes, on the top. The overall prediction accuracy was 86%, the two mRMR genes correctly classified 86% of the participants into their respective groups. The estimated network models showed different patterns of gene networks. In the network of the cases, FASLG was the most central gene. However, instead of FASLG, ABL1 and LTBR had the highest centrality in controls.
The mRMR algorithm and network analysis revealed a new correlation of genes associated with mitochondrial toxicity.
我们之前基于经验贝叶斯调节检验的错误发现率调整 P 值,确定了差异表达基因。然而,这种方法仅选择了部分差异表达基因,并未考虑到所选基因之间的冗余性。
本研究是对 HIV 感染者线粒体毒性的抗逆转录病毒治疗对凋亡途径基因影响的病例对照研究的二次分析,包括 16 例病例(HIV 感染伴线粒体毒性)和 16 例对照(未感染)。我们在病例和对照之间差异表达的基因上应用了最大相关性最小冗余度(mRMR)算法。mRMR 算法迭代地选择与分类预测最相关且最不冗余的特征(基因)。我们实施了几种机器学习分类器,并测试了两种 mRMR 基因的预测准确性。接下来,我们使用网络分析来估计和可视化差异表达基因之间的关联。我们采用马尔可夫随机场或无向网络模型来识别与线粒体毒性相关的基因网络。Spinglass 模型用于识别基因社区的聚类。
mRMR 算法将上调的促凋亡基因 DFFA 和 TNFRSF1A 排在前两位。整体预测准确率为 86%,两种 mRMR 基因正确地将 86%的参与者分类到各自的组中。估计的网络模型显示出不同的基因网络模式。在病例的网络中,FASLG 是最核心的基因。然而,在对照组中,代替 FASLG,ABL1 和 LTBR 具有最高的中心性。
mRMR 算法和网络分析揭示了与线粒体毒性相关的新基因相关性。