Suppr超能文献

转录组学和机器学习在精神分裂症遗传学中的应用:一项使用死后大脑数据的病例对照研究。

Transcriptomics and machine learning to advance schizophrenia genetics: A case-control study using post-mortem brain data.

机构信息

Department of Human Genetics, McGill University, Montreal, QC, Canada.

Faculty of Science, McGill University, Montreal, QC, Canada.

出版信息

Comput Methods Programs Biomed. 2022 Feb;214:106590. doi: 10.1016/j.cmpb.2021.106590. Epub 2021 Dec 16.

Abstract

BACKGROUND AND OBJECTIVE

Alterations of the expression of a variety of genes have been reported in patients with schizophrenia (SCZ). Moreover, machine learning (ML) analysis of gene expression microarray data has shown promising preliminary results in the study of SCZ. Our objective was to evaluate the performance of ML in classifying SCZ cases and controls based on gene expression microarray data from the dorsolateral prefrontal cortex.

METHODS

We apply a state-of-the-art ML algorithm (XGBoost) to train and evaluate a classification model using 201 SCZ cases and 278 controls. We utilized 10-fold cross-validation for model selection, and a held-out testing set to evaluate the model. The performance metric utilizes to evaluate classification performance was the area under the receiver-operator characteristics curve (AUC).

RESULTS

We report an average AUC on 10-fold cross-validation of 0.76 and an AUC of 0.76 on testing data, not used during training. Analysis of the rolling balanced classification accuracy from high to low prediction confidence levels showed that the most certain subset of predictions ranged between 80-90%. The ML model utilized 182 gene expression probes. Further improvement to classification performance was observed when applying an automated ML strategy on the 182 features, which achieved an AUC of 0.79 on the same testing data. We found literature evidence linking all of the top ten ML ranked genes to SCZ. Furthermore, we leveraged information from the full set of microarray gene expressions available via univariate differential gene expression analysis. We then prioritized differentially expressed gene sets using the piano gene set analysis package. We augmented the ranking of the prioritized gene sets with genes from the complex multivariate ML model using hypergeometric tests to identify more robust gene sets. We identified two significant Gene Ontology molecular function gene sets: "oxidoreductase activity, acting on the CH-NH2 group of donors" and "integrin binding." Lastly, we present candidate treatments for SCZ based on findings from our study CONCLUSIONS: Overall, we observed above-chance performance from ML classification of SCZ cases and controls based on brain gene expression microarray data, and found that ML analysis of gene expressions could further our understanding of the pathophysiology of SCZ and help identify novel treatments.

摘要

背景与目的

多种基因表达的改变已在精神分裂症(SCZ)患者中被报道。此外,基于基因表达微阵列数据的机器学习(ML)分析在 SCZ 的研究中显示出了有前景的初步结果。我们的目标是评估基于大脑基因表达微阵列数据的 ML 在 SCZ 病例和对照分类中的性能。

方法

我们应用一种最先进的 ML 算法(XGBoost),通过 201 例 SCZ 病例和 278 例对照,训练和评估分类模型。我们利用 10 倍交叉验证进行模型选择,并使用预留测试集来评估模型。用于评估分类性能的性能指标是接收器操作特征曲线下的面积(AUC)。

结果

我们报告了 10 倍交叉验证的平均 AUC 为 0.76,在未用于训练的测试数据上的 AUC 为 0.76。对从高到低预测置信水平的滚动平衡分类准确率的分析表明,最确定的预测子集范围在 80-90%之间。该 ML 模型使用了 182 个基因表达探针。当我们将自动 ML 策略应用于这 182 个特征时,观察到分类性能的进一步提高,在相同的测试数据上实现了 0.79 的 AUC。我们发现文献证据将排名前十的 ML 基因与 SCZ 联系起来。此外,我们利用了通过单变量差异基因表达分析可获得的全套微阵列基因表达信息。然后,我们使用钢琴基因集分析包对差异表达基因集进行优先级排序。我们使用超几何检验将多元 ML 模型中基因的优先级与排名较高的基因集结合起来,以确定更稳健的基因集。我们确定了两个有意义的基因本体论(GO)分子功能基因集:“氧化还原酶活性,作用于供体的 CH-NH2 基团”和“整合素结合”。最后,我们根据我们的研究结果提出了 SCZ 的候选治疗方法。

结论

总体而言,我们观察到基于大脑基因表达微阵列数据的 ML 对 SCZ 病例和对照的分类表现优于机会水平,并且发现基因表达的 ML 分析可以进一步加深我们对 SCZ 病理生理学的理解,并有助于确定新的治疗方法。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验