• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于 Hadoop 的随机森林在多变量神经影像学表型全基因组关联研究中的应用。

Random forests on Hadoop for genome-wide association studies of multivariate neuroimaging phenotypes.

出版信息

BMC Bioinformatics. 2013;14 Suppl 16(Suppl 16):S6. doi: 10.1186/1471-2105-14-S16-S6. Epub 2013 Oct 22.

DOI:10.1186/1471-2105-14-S16-S6
PMID:24564704
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3853073/
Abstract

MOTIVATION

Multivariate quantitative traits arise naturally in recent neuroimaging genetics studies, in which both structural and functional variability of the human brain is measured non-invasively through techniques such as magnetic resonance imaging (MRI). There is growing interest in detecting genetic variants associated with such multivariate traits, especially in genome-wide studies. Random forests (RFs) classifiers, which are ensembles of decision trees, are amongst the best performing machine learning algorithms and have been successfully employed for the prioritisation of genetic variants in case-control studies. RFs can also be applied to produce gene rankings in association studies with multivariate quantitative traits, and to estimate genetic similarities measures that are predictive of the trait. However, in studies involving hundreds of thousands of SNPs and high-dimensional traits, a very large ensemble of trees must be inferred from the data in order to obtain reliable rankings, which makes the application of these algorithms computationally prohibitive.

RESULTS

We have developed a parallel version of the RF algorithm for regression and genetic similarity learning tasks in large-scale population genetic association studies involving multivariate traits, called PaRFR (Parallel Random Forest Regression). Our implementation takes advantage of the MapReduce programming model and is deployed on Hadoop, an open-source software framework that supports data-intensive distributed applications. Notable speed-ups are obtained by introducing a distance-based criterion for node splitting in the tree estimation process. PaRFR has been applied to a genome-wide association study on Alzheimer's disease (AD) in which the quantitative trait consists of a high-dimensional neuroimaging phenotype describing longitudinal changes in the human brain structure. PaRFR provides a ranking of SNPs associated to this trait, and produces pair-wise measures of genetic proximity that can be directly compared to pair-wise measures of phenotypic proximity. Several known AD-related variants have been identified, including APOE4 and TOMM40. We also present experimental evidence supporting the hypothesis of a linear relationship between the number of top-ranked mutated states, or frequent mutation patterns, and an indicator of disease severity.

AVAILABILITY

The Java codes are freely available at http://www2.imperial.ac.uk/~gmontana.

摘要

动机

多变量数量性状在最近的神经影像学遗传学研究中自然出现,其中通过磁共振成像(MRI)等技术无创地测量人脑的结构和功能变异性。人们越来越感兴趣的是检测与这种多变量性状相关的遗传变异,尤其是在全基因组研究中。随机森林(RF)分类器是决策树的集成,是性能最佳的机器学习算法之一,已成功应用于病例对照研究中遗传变异的优先级排序。RF 还可用于在具有多变量数量性状的关联研究中生成基因排名,并估计遗传相似性度量,这些度量可预测性状。但是,在涉及数十万 SNP 和高维性状的研究中,必须从数据中推断出非常大的树集,以获得可靠的排名,这使得这些算法的应用在计算上变得非常复杂。

结果

我们为涉及多变量性状的大规模人群遗传关联研究中的回归和遗传相似性学习任务开发了 RF 算法的并行版本,称为 PaRFR(并行随机森林回归)。我们的实现利用了 MapReduce 编程模型,并部署在 Hadoop 上,Hadoop 是一个支持数据密集型分布式应用的开源软件框架。通过在树估计过程中引入基于距离的节点分裂标准,可以获得显著的加速。PaRFR 已应用于阿尔茨海默病(AD)的全基因组关联研究,其中数量性状由描述人脑结构纵向变化的高维神经影像学表型组成。PaRFR 提供了与该性状相关的 SNP 排名,并生成了可以直接与表型相似性的成对度量进行比较的遗传相似性的成对度量。已经确定了几个与 AD 相关的变体,包括 APOE4 和 TOMM40。我们还提供了支持突变状态或常见突变模式的数量与疾病严重程度指标之间存在线性关系的实验证据。

可用性

Java 代码可在 http://www2.imperial.ac.uk/~gmontana 免费获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dbd/3853073/83b9828265cd/1471-2105-14-S16-S6-11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dbd/3853073/457633406dc3/1471-2105-14-S16-S6-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dbd/3853073/07eb415cd14c/1471-2105-14-S16-S6-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dbd/3853073/5f751cac715c/1471-2105-14-S16-S6-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dbd/3853073/5e68b55be2a6/1471-2105-14-S16-S6-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dbd/3853073/75a7171bfa86/1471-2105-14-S16-S6-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dbd/3853073/2f93e17189c2/1471-2105-14-S16-S6-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dbd/3853073/fc45b1bc3b19/1471-2105-14-S16-S6-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dbd/3853073/c56b1d8137f7/1471-2105-14-S16-S6-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dbd/3853073/4c59dfddb3ed/1471-2105-14-S16-S6-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dbd/3853073/889d441921c3/1471-2105-14-S16-S6-10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dbd/3853073/83b9828265cd/1471-2105-14-S16-S6-11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dbd/3853073/457633406dc3/1471-2105-14-S16-S6-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dbd/3853073/07eb415cd14c/1471-2105-14-S16-S6-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dbd/3853073/5f751cac715c/1471-2105-14-S16-S6-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dbd/3853073/5e68b55be2a6/1471-2105-14-S16-S6-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dbd/3853073/75a7171bfa86/1471-2105-14-S16-S6-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dbd/3853073/2f93e17189c2/1471-2105-14-S16-S6-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dbd/3853073/fc45b1bc3b19/1471-2105-14-S16-S6-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dbd/3853073/c56b1d8137f7/1471-2105-14-S16-S6-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dbd/3853073/4c59dfddb3ed/1471-2105-14-S16-S6-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dbd/3853073/889d441921c3/1471-2105-14-S16-S6-10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dbd/3853073/83b9828265cd/1471-2105-14-S16-S6-11.jpg

相似文献

1
Random forests on Hadoop for genome-wide association studies of multivariate neuroimaging phenotypes.基于 Hadoop 的随机森林在多变量神经影像学表型全基因组关联研究中的应用。
BMC Bioinformatics. 2013;14 Suppl 16(Suppl 16):S6. doi: 10.1186/1471-2105-14-S16-S6. Epub 2013 Oct 22.
2
Random forests on distance matrices for imaging genetics studies.用于影像遗传学研究的距离矩阵上的随机森林。
Stat Appl Genet Mol Biol. 2013 Dec;12(6):757-86. doi: 10.1515/sagmb-2013-0040.
3
Identifying Candidate Genetic Associations with MRI-Derived AD-Related ROI via Tree-Guided Sparse Learning.通过树引导稀疏学习识别与 MRI 衍生的 AD 相关 ROI 相关的候选遗传关联。
IEEE/ACM Trans Comput Biol Bioinform. 2019 Nov-Dec;16(6):1986-1996. doi: 10.1109/TCBB.2018.2833487. Epub 2018 May 7.
4
Hippocampal atrophy as a quantitative trait in a genome-wide association study identifying novel susceptibility genes for Alzheimer's disease.全基因组关联研究发现海马萎缩是阿尔茨海默病的新的易感基因的定量特征。
PLoS One. 2009 Aug 7;4(8):e6501. doi: 10.1371/journal.pone.0006501.
5
Genome-wide association analysis of secondary imaging phenotypes from the Alzheimer's disease neuroimaging initiative study.阿尔茨海默病神经影像倡议研究中二次影像表型的全基因组关联分析。
Neuroimage. 2017 Feb 1;146:983-1002. doi: 10.1016/j.neuroimage.2016.09.055. Epub 2016 Oct 4.
6
Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests.使用基于质量的两阶段随机森林进行全基因组关联数据分类和单核苷酸多态性选择。
BMC Genomics. 2015;16 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2164-16-S2-S5. Epub 2015 Jan 21.
7
Identification of gene pathways implicated in Alzheimer's disease using longitudinal imaging phenotypes with sparse regression.使用稀疏回归的纵向影像表型鉴定阿尔茨海默病相关的基因途径。
Neuroimage. 2012 Nov 15;63(3):1681-94. doi: 10.1016/j.neuroimage.2012.08.002. Epub 2012 Aug 15.
8
Adaptive testing for multiple traits in a proportional odds model with applications to detect SNP-brain network associations.比例优势模型中多性状的自适应检验及其在检测单核苷酸多态性与脑网络关联中的应用
Genet Epidemiol. 2017 Apr;41(3):259-277. doi: 10.1002/gepi.22033. Epub 2017 Feb 13.
9
Sparse reduced-rank regression detects genetic associations with voxel-wise longitudinal phenotypes in Alzheimer's disease.稀疏降秩回归检测阿尔茨海默病中体素水平纵向表型的遗传关联。
Neuroimage. 2012 Mar;60(1):700-16. doi: 10.1016/j.neuroimage.2011.12.029. Epub 2011 Dec 22.
10
Whole genome association study of brain-wide imaging phenotypes for identifying quantitative trait loci in MCI and AD: A study of the ADNI cohort.全基因组关联研究对脑影像学表型进行分析,以鉴定 MCI 和 AD 中的数量性状基因座:ADNI 队列研究。
Neuroimage. 2010 Nov 15;53(3):1051-63. doi: 10.1016/j.neuroimage.2010.01.042. Epub 2010 Jan 25.

引用本文的文献

1
Applications of Artificial Intelligence for Heat Stress Management in Ruminant Livestock.人工智能在反刍家畜热应激管理中的应用。
Sensors (Basel). 2024 Sep 11;24(18):5890. doi: 10.3390/s24185890.
2
Machine learning methods applied to genotyping data capture interactions between single nucleotide variants in late onset Alzheimer's disease.应用于基因分型数据的机器学习方法捕捉晚发性阿尔茨海默病中单核苷酸变异之间的相互作用。
Alzheimers Dement (Amst). 2022 Apr 5;14(1):e12300. doi: 10.1002/dad2.12300. eCollection 2022.
3
Cardiovascular Imaging and Intervention Through the Lens of Artificial Intelligence.

本文引用的文献

1
Improved statistical model checking methods for pathway analysis.改进的通路分析统计模型检验方法。
BMC Bioinformatics. 2012;13 Suppl 17(Suppl 17):S15. doi: 10.1186/1471-2105-13-S17-S15. Epub 2012 Dec 13.
2
A genome-wide association study identifies five loci influencing facial morphology in Europeans.全基因组关联研究鉴定出影响欧洲人面部形态的五个位点。
PLoS Genet. 2012 Sep;8(9):e1002932. doi: 10.1371/journal.pgen.1002932. Epub 2012 Sep 13.
3
Identification of gene pathways implicated in Alzheimer's disease using longitudinal imaging phenotypes with sparse regression.
透过人工智能视角看心血管成像与介入治疗
Interv Cardiol. 2021 Oct 20;16:e31. doi: 10.15420/icr.2020.04. eCollection 2021 Apr.
4
The Application of Artificial Intelligence in the Genetic Study of Alzheimer's Disease.人工智能在阿尔茨海默病基因研究中的应用
Aging Dis. 2020 Dec 1;11(6):1567-1584. doi: 10.14336/AD.2020.0312. eCollection 2020 Dec.
5
The Birth of Bio-data Science: Trends, Expectations, and Applications.生物数据科学的诞生:趋势、期望与应用
Genomics Proteomics Bioinformatics. 2020 Feb;18(1):5-15. doi: 10.1016/j.gpb.2020.01.002. Epub 2020 May 16.
6
Reaching the End-Game for GWAS: Machine Learning Approaches for the Prioritization of Complex Disease Loci.全基因组关联研究进入终局:用于复杂疾病基因座优先级排序的机器学习方法
Front Genet. 2020 Apr 15;11:350. doi: 10.3389/fgene.2020.00350. eCollection 2020.
7
Exploring the druggable space around the Fanconi anemia pathway using machine learning and mechanistic models.运用机器学习和机制模型探索范可尼贫血途径周围的可成药性空间。
BMC Bioinformatics. 2019 Jul 2;20(1):370. doi: 10.1186/s12859-019-2969-0.
8
A Magnetoencephalographic/Encephalographic (MEG/EEG) Brain-Computer Interface Driver for Interactive iOS Mobile Videogame Applications Utilizing the Hadoop Ecosystem, MongoDB, and Cassandra NoSQL Databases.一种用于交互式iOS移动视频游戏应用程序的脑磁图/脑电图(MEG/EEG)脑机接口驱动程序,该程序利用Hadoop生态系统、MongoDB和Cassandra非关系型数据库。
Diseases. 2018 Sep 28;6(4):89. doi: 10.3390/diseases6040089.
9
Using Distributed Data over HBase in Big Data Analytics Platform for Clinical Services.在用于临床服务的大数据分析平台中通过HBase使用分布式数据。
Comput Math Methods Med. 2017;2017:6120820. doi: 10.1155/2017/6120820. Epub 2017 Dec 11.
10
Single Marker and Haplotype-Based Association Analysis of Semolina and Pasta Colour in Elite Durum Wheat Breeding Lines Using a High-Density Consensus Map.利用高密度共识图谱对优质硬粒小麦育种系中粗粒小麦粉和意大利面颜色进行单标记和单倍型关联分析
PLoS One. 2017 Jan 30;12(1):e0170941. doi: 10.1371/journal.pone.0170941. eCollection 2017.
使用稀疏回归的纵向影像表型鉴定阿尔茨海默病相关的基因途径。
Neuroimage. 2012 Nov 15;63(3):1681-94. doi: 10.1016/j.neuroimage.2012.08.002. Epub 2012 Aug 15.
4
Random forests for genetic association studies.用于基因关联研究的随机森林算法。
Stat Appl Genet Mol Biol. 2011;10(1):32. doi: 10.2202/1544-6115.1691. Epub 2011 Jul 12.
5
Neuroimaging and genetics: exploring, searching, and finding.神经影像学与遗传学:探索、搜寻与发现
Twin Res Hum Genet. 2012 Jun;15(3):267-72. doi: 10.1017/thg.2012.20.
6
Multilocus genetic analysis of brain images.脑图像的多位点基因分析。
Front Genet. 2011 Oct 21;2:73. doi: 10.3389/fgene.2011.00073. eCollection 2011.
7
Sparse reduced-rank regression detects genetic associations with voxel-wise longitudinal phenotypes in Alzheimer's disease.稀疏降秩回归检测阿尔茨海默病中体素水平纵向表型的遗传关联。
Neuroimage. 2012 Mar;60(1):700-16. doi: 10.1016/j.neuroimage.2011.12.029. Epub 2011 Dec 22.
8
The future of fMRI and genetics research.功能磁共振成像和遗传学研究的未来。
Neuroimage. 2012 Aug 15;62(2):1286-92. doi: 10.1016/j.neuroimage.2011.10.063. Epub 2011 Oct 28.
9
Distance-based differential analysis of gene curves.基于距离的基因曲线差异分析。
Bioinformatics. 2011 Nov 15;27(22):3135-41. doi: 10.1093/bioinformatics/btr528. Epub 2011 Oct 7.
10
Power of data mining methods to detect genetic associations and interactions.数据挖掘方法检测基因关联和相互作用的能力。
Hum Hered. 2011;72(2):85-97. doi: 10.1159/000330579. Epub 2011 Sep 17.