Suppr超能文献

基于可解释机器学习的缺血性中风全基因组关联研究

Genome-wide association studies of ischemic stroke based on interpretable machine learning.

作者信息

Nikolić Stefan, Ignatov Dmitry I, Khvorykh Gennady V, Limborska Svetlana A, Khrunin Andrey V

机构信息

Laboratory for Models and Methods of Computational Pragmatics; Department of Data Analysis and Artificial Intelligence, HSE University, Moscow, Russia.

National Research Centre "Kurchatov Institute", Moscow, Russia.

出版信息

PeerJ Comput Sci. 2024 Nov 6;10:e2454. doi: 10.7717/peerj-cs.2454. eCollection 2024.

Abstract

Despite the identification of several dozen genetic loci associated with ischemic stroke (IS), the genetic bases of this disease remain largely unexplored. In this research we present the results of genome-wide association studies (GWAS) based on classical statistical testing and machine learning algorithms (logistic regression, gradient boosting on decision trees, and tabular deep learning model TabNet). To build a consensus on the results obtained by different techniques, the Pareto-Optimal solution was proposed and applied. These methods were applied to real genotypic data of sick and healthy individuals of European ancestry obtained from the Database of Genotypes and Phenotypes (5,581 individuals, 883,749 single nucleotide polymorphisms). Finally, 131 genes were identified as candidates for association with the onset of IS. , , and were previously described as associated with the course of IS in model animals. taking part in metabolism of fatty acids was shown for the first time to be associated with IS. The identified genes were compared with genes from the Illuminating Druggable Genome project. The product of representing the G-coupled protein receptor can be considered as a therapeutic target for stroke prevention. The approaches presented in this research can be used to reprocess GWAS datasets from other diseases.

摘要

尽管已鉴定出几十个与缺血性中风(IS)相关的基因位点,但该疾病的遗传基础在很大程度上仍未被探索。在本研究中,我们展示了基于经典统计测试和机器学习算法(逻辑回归、决策树梯度提升和表格深度学习模型TabNet)的全基因组关联研究(GWAS)结果。为了就不同技术获得的结果达成共识,提出并应用了帕累托最优解。这些方法应用于从基因型和表型数据库获得的欧洲血统患病个体和健康个体的真实基因型数据(5581个个体,883749个单核苷酸多态性)。最终,131个基因被确定为与IS发病相关的候选基因。 、 和 先前在模型动物中被描述为与IS病程相关。首次表明参与脂肪酸代谢的 与IS相关。将鉴定出的基因与来自“照亮可药物基因组”项目的基因进行比较。代表G偶联蛋白受体的 的产物可被视为中风预防的治疗靶点。本研究中提出的方法可用于重新处理来自其他疾病的GWAS数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/16b6/11623107/752e1c1d667c/peerj-cs-10-2454-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验