Suppr超能文献

阿尔茨海默病风险基因的优先级排序:基于支持向量机的利用人类大脑时空基因表达数据的分析框架

Prioritization of risk genes for Alzheimer's disease: an analysis framework using spatial and temporal gene expression data in the human brain based on support vector machine.

作者信息

Wang Shiyu, Fang Xixian, Wen Xiang, Yang Congying, Yang Ying, Zhang Tianxiao

机构信息

Department of Epidemiology and Biostatistics, School of Public Health, Xi'an Jiaotong University Health Science Center, Xi'an, China.

Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Beijing, China.

出版信息

Front Genet. 2023 Oct 6;14:1190863. doi: 10.3389/fgene.2023.1190863. eCollection 2023.

Abstract

Alzheimer's disease (AD) is a complex disorder, and its risk is influenced by multiple genetic and environmental factors. In this study, an AD risk gene prediction framework based on spatial and temporal features of gene expression data (STGE) was proposed. We proposed an AD risk gene prediction framework based on spatial and temporal features of gene expression data. The gene expression data of providers of different tissues and ages were used as model features. Human genes were classified as AD risk or non-risk sets based on information extracted from relevant databases. Support vector machine (SVM) models were constructed to capture the expression patterns of genes believed to contribute to the risk of AD. The recursive feature elimination (RFE) method was utilized for feature selection. Data for 64 tissue-age features were obtained before feature selection, and this number was reduced to 19 after RFE was performed. The SVM models were built and evaluated using 19 selected and full features. The area under curve (AUC) values for the SVM model based on 19 selected features (0.740 [0.690-0.790]) and full feature sets (0.730 [0.678-0.769]) were very similar. Fifteen genes predicted to be risk genes for AD with a probability greater than 90% were obtained. The newly proposed framework performed comparably to previous prediction methods based on protein-protein interaction (PPI) network properties. A list of 15 candidate genes for AD risk was also generated to provide data support for further studies on the genetic etiology of AD.

摘要

阿尔茨海默病(AD)是一种复杂的疾病,其风险受多种遗传和环境因素影响。在本研究中,提出了一种基于基因表达数据时空特征(STGE)的AD风险基因预测框架。我们提出了一种基于基因表达数据时空特征的AD风险基因预测框架。将来自不同组织和年龄供体的基因表达数据用作模型特征。根据从相关数据库中提取的信息,将人类基因分为AD风险组或非风险组。构建支持向量机(SVM)模型以捕捉被认为与AD风险相关的基因表达模式。利用递归特征消除(RFE)方法进行特征选择。在特征选择前获得了64个组织年龄特征的数据,在进行RFE后,这个数字减少到了19个。使用19个选定特征和全部特征构建并评估SVM模型。基于19个选定特征的SVM模型的曲线下面积(AUC)值为0.740[0.690 - 0.790],基于全部特征集的AUC值为0.730[0.678 - 0.769],二者非常相似。获得了15个预测为AD风险基因且概率大于90%的基因。新提出的框架与先前基于蛋白质-蛋白质相互作用(PPI)网络特性的预测方法表现相当。还生成了一份包含15个AD风险候选基因的列表,为AD遗传病因的进一步研究提供数据支持。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5642/10587557/6336a9df93eb/fgene-14-1190863-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验