对高密度基因组变异数据进行数据挖掘，以预测阿尔茨海默病的风险。

Data mining of high density genomic variant data for prediction of Alzheimer's disease risk.

机构信息

Computational Biosciences Program, School of Mathematics and Statistical Sciences, Arizona State University, 1711 South Rural Road, Tempe, Arizona 85287-1804, USA.

出版信息

BMC Med Genet. 2012 Jan 25;13:7. doi: 10.1186/1471-2350-13-7.

DOI:10.1186/1471-2350-13-7

PMID:22273362

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3355044/

Abstract

BACKGROUND

The discovery of genetic associations is an important factor in the understanding of human illness to derive disease pathways. Identifying multiple interacting genetic mutations associated with disease remains challenging in studying the etiology of complex diseases. And although recently new single nucleotide polymorphisms (SNPs) at genes implicated in immune response, cholesterol/lipid metabolism, and cell membrane processes have been confirmed by genome-wide association studies (GWAS) to be associated with late-onset Alzheimer's disease (LOAD), a percentage of AD heritability continues to be unexplained. We try to find other genetic variants that may influence LOAD risk utilizing data mining methods.

METHODS

Two different approaches were devised to select SNPs associated with LOAD in a publicly available GWAS data set consisting of three cohorts. In both approaches, single-locus analysis (logistic regression) was conducted to filter the data with a less conservative p-value than the Bonferroni threshold; this resulted in a subset of SNPs used next in multi-locus analysis (random forest (RF)). In the second approach, we took into account prior biological knowledge, and performed sample stratification and linkage disequilibrium (LD) in addition to logistic regression analysis to preselect loci to input into the RF classifier construction step.

RESULTS

The first approach gave 199 SNPs mostly associated with genes in calcium signaling, cell adhesion, endocytosis, immune response, and synaptic function. These SNPs together with APOE and GAB2 SNPs formed a predictive subset for LOAD status with an average error of 9.8% using 10-fold cross validation (CV) in RF modeling. Nineteen variants in LD with ST5, TRPC1, ATG10, ANO3, NDUFA12, and NISCH respectively, genes linked directly or indirectly with neurobiology, were identified with the second approach. These variants were part of a model that included APOE and GAB2 SNPs to predict LOAD risk which produced a 10-fold CV average error of 17.5% in the classification modeling.

CONCLUSIONS

With the two proposed approaches, we identified a large subset of SNPs in genes mostly clustered around specific pathways/functions and a smaller set of SNPs, within or in proximity to five genes not previously reported, that may be relevant for the prediction/understanding of AD.

摘要

背景

遗传关联的发现是理解人类疾病以推导疾病途径的重要因素。在研究复杂疾病的病因学时，确定与疾病相关的多个相互作用的遗传突变仍然具有挑战性。尽管最近通过全基因组关联研究（GWAS）证实了与迟发性阿尔茨海默病（LOAD）相关的基因中涉及免疫反应、胆固醇/脂质代谢和细胞膜过程的新的单核苷酸多态性（SNP），但 AD 遗传率的一部分仍然无法解释。我们尝试利用数据挖掘方法寻找其他可能影响 LOAD 风险的遗传变异。

方法

设计了两种不同的方法来从包含三个队列的公开可用 GWAS 数据集选择与 LOAD 相关的 SNP。在这两种方法中，都进行了单基因座分析（逻辑回归），以过滤具有比 Bonferroni 阈值更保守的 p 值的数据；这导致了下一步多基因座分析（随机森林（RF））使用的 SNP 子集。在第二种方法中，我们考虑了先前的生物学知识，并进行了样本分层和连锁不平衡（LD）分析，除了逻辑回归分析之外，还对预先选择输入 RF 分类器构建步骤的基因座进行了预选择。

结果

第一种方法给出了 199 个 SNP，主要与钙信号、细胞黏附、内吞作用、免疫反应和突触功能相关的基因相关。这些 SNP 与 APOE 和 GAB2 SNP 一起，在 RF 建模的 10 倍交叉验证（CV）中，使用平均误差 9.8%形成 LOAD 状态的预测子集。通过第二种方法，确定了与 ST5、TRPC1、ATG10、ANO3、NDUFA12 和 NISCH 分别连锁的 19 个变异，这些基因直接或间接与神经生物学相关。这些变体是包括 APOE 和 GAB2 SNP 的模型的一部分，用于预测 LOAD 风险，在分类建模中，10 倍 CV 平均误差为 17.5%。

结论

通过这两种方法，我们在主要围绕特定途径/功能聚类的基因中确定了 SNP 的一个大子集，以及在五个以前未报道的基因内或附近的 SNP 的一个较小子集，这些 SNP 可能与 AD 的预测/理解有关。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9157/3355044/a4eb7e949028/1471-2350-13-7-1.jpg

相似文献

Data mining of high density genomic variant data for prediction of Alzheimer's disease risk.对高密度基因组变异数据进行数据挖掘，以预测阿尔茨海默病的风险。

BMC Med Genet. 2012 Jan 25;13:7. doi: 10.1186/1471-2350-13-7.

Alzheimer's Disease Risk Polymorphisms Regulate Gene Expression in the ZCWPW1 and the CELF1 Loci.阿尔茨海默病风险多态性调控 ZCWPW1 和 CELF1 基因座中的基因表达。

PLoS One. 2016 Feb 26;11(2):e0148717. doi: 10.1371/journal.pone.0148717. eCollection 2016.

Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests.使用基于质量的两阶段随机森林进行全基因组关联数据分类和单核苷酸多态性选择。

BMC Genomics. 2015;16 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2164-16-S2-S5. Epub 2015 Jan 21.

Genetic variants influencing human aging from late-onset Alzheimer's disease (LOAD) genome-wide association studies (GWAS).影响晚发性阿尔茨海默病（LOAD）全基因组关联研究中人类衰老的遗传变异。

Neurobiol Aging. 2012 Aug;33(8):1849.e5-18. doi: 10.1016/j.neurobiolaging.2012.02.014. Epub 2012 Mar 23.

Identification of Key Long Non-Coding RNAs in the Pathology of Alzheimer's Disease and their Functions Based on Genome-Wide Associations Study, Microarray, and RNA-seq Data.基于全基因组关联研究、微阵列和 RNA-seq 数据鉴定阿尔茨海默病病理中的关键长非编码 RNA 及其功能。

J Alzheimers Dis. 2019;68(1):339-355. doi: 10.3233/JAD-181051.

Meta-analysis for genome-wide association study identifies multiple variants at the BIN1 locus associated with late-onset Alzheimer's disease.全基因组关联研究的荟萃分析鉴定出 BIN1 基因座与晚发性阿尔茨海默病相关的多个变异。

PLoS One. 2011 Feb 24;6(2):e16616. doi: 10.1371/journal.pone.0016616.

ApoE variant p.V236E is associated with markedly reduced risk of Alzheimer's disease.载脂蛋白 E 变异体 p.V236E 与阿尔茨海默病风险显著降低相关。

Mol Neurodegener. 2014 Mar 10;9:11. doi: 10.1186/1750-1326-9-11.

Data integration for functional annotation of regulatory single nucleotide polymorphisms associated with Alzheimer's disease susceptibility.用于阿尔茨海默病易感性相关调控单核苷酸多态性功能注释的数据集成。

Gene. 2018 Sep 25;672:115-125. doi: 10.1016/j.gene.2018.06.011. Epub 2018 Jun 5.

Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease.对 74046 人的荟萃分析确定了 11 个阿尔茨海默病的新易感性位点。

Nat Genet. 2013 Dec;45(12):1452-8. doi: 10.1038/ng.2802. Epub 2013 Oct 27.

Performance of random forest when SNPs are in linkage disequilibrium.单核苷酸多态性处于连锁不平衡状态时随机森林的性能。

BMC Bioinformatics. 2009 Mar 5;10:78. doi: 10.1186/1471-2105-10-78.

引用本文的文献

Genome-Wide Association Study of Gallstone Disease Identifies Novel Candidate Genomic Variants in a Latino Community of Southwest USA.胆结石病的全基因组关联研究在美国西南部拉丁裔社区中发现了新的候选基因组变异。

J Racial Ethn Health Disparities. 2025 Feb;12(1):234-240. doi: 10.1007/s40615-023-01867-0. Epub 2023 Nov 28.

Investigation of Phosphatidylserine-Transporting Activity of Human TMEM16C Isoforms.人TMEM16C亚型磷脂酰丝氨酸转运活性的研究

Membranes (Basel). 2022 Oct 17;12(10):1005. doi: 10.3390/membranes12101005.

Autophagopathies: from autophagy gene polymorphisms to precision medicine for human diseases.自噬病理：从自噬基因多态性到人类疾病的精准医学。

Autophagy. 2022 Nov;18(11):2519-2536. doi: 10.1080/15548627.2022.2039994. Epub 2022 Apr 6.

Careful feature selection is key in classification of Alzheimer's disease patients based on whole-genome sequencing data.在基于全基因组测序数据对阿尔茨海默病患者进行分类时，仔细的特征选择是关键。

NAR Genom Bioinform. 2021 Jul 27;3(3):lqab069. doi: 10.1093/nargab/lqab069. eCollection 2021 Sep.

Addressing Measurement Error in Random Forests Using Quantitative Bias Analysis.利用定量偏差分析解决随机森林中的测量误差问题。

Am J Epidemiol. 2021 Sep 1;190(9):1830-1840. doi: 10.1093/aje/kwab010.

An algorithm for direct causal learning of influences on patient outcomes.一种用于直接因果学习对患者预后影响的算法。

Artif Intell Med. 2017 Jan;75:1-15. doi: 10.1016/j.artmed.2016.10.003. Epub 2016 Nov 5.

Mobile-phone radiation-induced perturbation of gene-expression profiling, redox equilibrium and sporadic-apoptosis control in the ovary of Drosophila melanogaster.移动电话辐射引起的黑腹果蝇卵巢中基因表达谱、氧化还原平衡和散发性细胞凋亡控制的扰动。

Fly (Austin). 2017 Apr 3;11(2):75-95. doi: 10.1080/19336934.2016.1270487. Epub 2016 Dec 14.

Evaluation of a two-stage framework for prediction using big genomic data.使用大型基因组数据评估用于预测的两阶段框架。

Brief Bioinform. 2015 Nov;16(6):912-21. doi: 10.1093/bib/bbv010. Epub 2015 Mar 18.

Identification of novel radiation-induced p53-dependent transcripts extensively regulated during mouse brain development.鉴定新型辐射诱导的 p53 依赖性转录本，这些转录本在小鼠脑发育过程中广泛调控。

Biol Open. 2015 Feb 13;4(3):331-44. doi: 10.1242/bio.20149969.

Are inflammatory profiles the key to personalized Alzheimer's treatment?炎症特征是个性化阿尔茨海默病治疗的关键吗？

Neurodegener Dis Manag. 2013;3(4):343-351. doi: 10.2217/nmt.13.40.

本文引用的文献

Alzheimer's genetics in the GWAS era: a continuing story of 'replications and refutations'.GWAS 时代的阿尔茨海默病遗传学：“重复与反驳”的持续故事。

Curr Neurol Neurosci Rep. 2011 Jun;11(3):246-53. doi: 10.1007/s11910-011-0193-z.

Common variants at MS4A4/MS4A6E, CD2AP, CD33 and EPHA1 are associated with late-onset Alzheimer's disease.MS4A4/MS4A6E、CD2AP、CD33 和 EPHA1 上的常见变异与晚发性阿尔茨海默病相关。

Nat Genet. 2011 May;43(5):436-41. doi: 10.1038/ng.801. Epub 2011 Apr 3.

Common variants at ABCA7, MS4A6A/MS4A4E, EPHA1, CD33 and CD2AP are associated with Alzheimer's disease.载脂蛋白 A7（ABCA7）、膜表面抗原 4A6A/4A4E（MS4A6A/MS4A4E）、EPH 受体 A1（EPHA1）、CD33 和 CD2 相关蛋白激酶 A（CD2AP）上的常见变异与阿尔茨海默病有关。

Nat Genet. 2011 May;43(5):429-35. doi: 10.1038/ng.803. Epub 2011 Apr 3.

Decreased clearance of CNS beta-amyloid in Alzheimer's disease.阿尔茨海默病患者中枢神经系统β-淀粉样蛋白清除减少。

Science. 2010 Dec 24;330(6012):1774. doi: 10.1126/science.1197623. Epub 2010 Dec 9.

Genetic evidence implicates the immune system and cholesterol metabolism in the aetiology of Alzheimer's disease.遗传证据表明，免疫系统和胆固醇代谢与阿尔茨海默病的病因有关。

PLoS One. 2010 Nov 15;5(11):e13950. doi: 10.1371/journal.pone.0013950.

Alzheimer's unlocked.阿尔茨海默病之谜被解开。

Time. 2010 Oct 25;176(17):53-9.

Evolution and functional divergence of the anoctamin family of membrane proteins.膜蛋白 anoctamin 家族的进化和功能分化。

BMC Evol Biol. 2010 Oct 21;10:319. doi: 10.1186/1471-2148-10-319.

Alzheimer's disease genetics: current knowledge and future challenges.阿尔茨海默病遗传学：当前的知识和未来的挑战。

Int J Geriatr Psychiatry. 2011 Aug;26(8):793-802. doi: 10.1002/gps.2628. Epub 2010 Oct 19.

Molecular and genetic parameters defining T-cell clonal selection.定义 T 细胞克隆选择的分子和遗传参数。

Immunol Cell Biol. 2011 Jan;89(1):16-26. doi: 10.1038/icb.2010.119. Epub 2010 Oct 19.

The genetics of Alzheimer disease: back to the future.阿尔茨海默病的遗传学：回到未来。

Neuron. 2010 Oct 21;68(2):270-81. doi: 10.1016/j.neuron.2010.10.013.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

对高密度基因组变异数据进行数据挖掘，以预测阿尔茨海默病的风险。

Data mining of high density genomic variant data for prediction of Alzheimer's disease risk.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献