• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

数据挖掘、神经网络、决策树——遗传分析研讨会15的问题2和问题3。

Data mining, neural nets, trees--problems 2 and 3 of Genetic Analysis Workshop 15.

作者信息

Ziegler Andreas, DeStefano Anita L, König Inke R, Bardel Claire, Brinza Dumitru, Bull Shelley, Cai Zhaohui, Glaser Beate, Jiang Wei, Lee Kristine E, Li Chuang Xing, Li Jing, Li Xin, Majoram Paul, Meng Yan, Nicodemus Kristin K, Platt Alexander, Schwarz Daniel F, Shi Weilang, Shugart Yin Yao, Stassen Hans H, Sun Yan V, Won Sungho, Wang Wenyi, Wahba Grace, Zagaar Usumah A, Zhao Zhenming

机构信息

Institut für Medizinische Biometrie und Statistik, Universitätsklinikum Schleswig-Holstein, Universität zu Lübeck, Ratzeburger Allee 160, Lübeck, Germany.

出版信息

Genet Epidemiol. 2007;31 Suppl 1:S51-60. doi: 10.1002/gepi.20280.

DOI:10.1002/gepi.20280
PMID:18046765
Abstract

Genome-wide association studies using thousands to hundreds of thousands of single nucleotide polymorphism (SNP) markers and region-wide association studies using a dense panel of SNPs are already in use to identify disease susceptibility genes and to predict disease risk in individuals. Because these tasks become increasingly important, three different data sets were provided for the Genetic Analysis Workshop 15, thus allowing examination of various novel and existing data mining methods for both classification and identification of disease susceptibility genes, gene by gene or gene by environment interaction. The approach most often applied in this presentation group was random forests because of its simplicity, elegance, and robustness. It was used for prediction and for screening for interesting SNPs in a first step. The logistic tree with unbiased selection approach appeared to be an interesting alternative to efficiently select interesting SNPs. Machine learning, specifically ensemble methods, might be useful as pre-screening tools for large-scale association studies because they can be less prone to overfitting, can be less computer processor time intensive, can easily include pair-wise and higher-order interactions compared with standard statistical approaches and can also have a high capability for classification. However, improved implementations that are able to deal with hundreds of thousands of SNPs at a time are required.

摘要

全基因组关联研究使用成千上万到数十万的单核苷酸多态性(SNP)标记,区域全基因组关联研究使用密集的SNP面板,这些研究已被用于识别疾病易感基因并预测个体的疾病风险。由于这些任务变得越来越重要,为遗传分析研讨会15提供了三个不同的数据集,从而能够检验各种新颖的和现有的数据挖掘方法,用于疾病易感基因的分类和识别,逐个基因或基因与环境的相互作用分析。在本展示组中最常应用的方法是随机森林,因为它简单、优雅且稳健。它首先用于预测和筛选有趣的SNP。具有无偏选择方法的逻辑树似乎是有效选择有趣SNP的一个有趣替代方法。机器学习,特别是集成方法,可能作为大规模关联研究的预筛选工具很有用,因为它们比标准统计方法更不易过度拟合,计算机处理器时间消耗更少,能轻松纳入成对和高阶相互作用,并且分类能力也很高。然而,需要能够一次处理数十万SNP的改进实现。

相似文献

1
Data mining, neural nets, trees--problems 2 and 3 of Genetic Analysis Workshop 15.数据挖掘、神经网络、决策树——遗传分析研讨会15的问题2和问题3。
Genet Epidemiol. 2007;31 Suppl 1:S51-60. doi: 10.1002/gepi.20280.
2
Screening large-scale association study data: exploiting interactions using random forests.筛选大规模关联研究数据:利用随机森林探索相互作用
BMC Genet. 2004 Dec 10;5:32. doi: 10.1186/1471-2156-5-32.
3
Evaluating the ability of tree-based methods and logistic regression for the detection of SNP-SNP interaction.评估基于树的方法和逻辑回归检测单核苷酸多态性(SNP)-SNP相互作用的能力。
Ann Hum Genet. 2009 May;73(Pt 3):360-9. doi: 10.1111/j.1469-1809.2009.00511.x. Epub 2009 Mar 8.
4
Machine learning classification procedure for selecting SNPs in genomic selection: application to early mortality in broilers.基因组选择中用于选择单核苷酸多态性(SNP)的机器学习分类程序:在肉鸡早期死亡率中的应用
J Anim Breed Genet. 2007 Dec;124(6):377-89. doi: 10.1111/j.1439-0388.2007.00694.x.
5
Identification of SNP interactions using logic regression.使用逻辑回归识别单核苷酸多态性(SNP)相互作用。
Biostatistics. 2008 Jan;9(1):187-98. doi: 10.1093/biostatistics/kxm024. Epub 2007 Jun 19.
6
Multigenic modeling of complex disease by random forests.随机森林模型对复杂疾病的多基因建模。
Adv Genet. 2010;72:73-99. doi: 10.1016/B978-0-12-380862-2.00004-7.
7
SNPHarvester: a filtering-based approach for detecting epistatic interactions in genome-wide association studies.SNPHarvester:一种在全基因组关联研究中基于过滤的上位性相互作用检测方法。
Bioinformatics. 2009 Feb 15;25(4):504-11. doi: 10.1093/bioinformatics/btn652. Epub 2008 Dec 19.
8
Data mining and genetic algorithm based gene/SNP selection.基于数据挖掘和遗传算法的基因/单核苷酸多态性选择
Artif Intell Med. 2004 Jul;31(3):183-96. doi: 10.1016/j.artmed.2004.04.002.
9
Detecting AIDS restriction genes: from candidate genes to genome-wide association discovery.检测艾滋病限制基因:从候选基因到全基因组关联发现
Vaccine. 2008 Jun 6;26(24):2951-65. doi: 10.1016/j.vaccine.2007.12.054. Epub 2008 Feb 1.
10
Comparison of approaches for machine-learning optimization of neural networks for detecting gene-gene interactions in genetic epidemiology.遗传流行病学中用于检测基因-基因相互作用的神经网络机器学习优化方法的比较。
Genet Epidemiol. 2008 May;32(4):325-40. doi: 10.1002/gepi.20307.

引用本文的文献

1
An evaluation of machine-learning for predicting phenotype: studies in yeast, rice, and wheat.用于预测表型的机器学习评估:酵母、水稻和小麦的研究
Mach Learn. 2020;109(2):251-277. doi: 10.1007/s10994-019-05848-5. Epub 2019 Oct 23.
2
Comparative performances of machine learning methods for classifying Crohn Disease patients using genome-wide genotyping data.使用全基因组基因分型数据对克罗恩病患者进行分类的机器学习方法的比较性能。
Sci Rep. 2019 Jul 17;9(1):10351. doi: 10.1038/s41598-019-46649-z.
3
Statistical learning approaches in the genetic epidemiology of complex diseases.
复杂疾病遗传流行病学中的统计学习方法。
Hum Genet. 2020 Jan;139(1):73-84. doi: 10.1007/s00439-019-01996-9. Epub 2019 May 2.
4
Ensemble learning for detecting gene-gene interactions in colorectal cancer.用于检测结直肠癌中基因-基因相互作用的集成学习
PeerJ. 2018 Oct 29;6:e5854. doi: 10.7717/peerj.5854. eCollection 2018.
5
Do little interactions get lost in dark random forests?微小的相互作用会在黑暗的随机森林中消失吗?
BMC Bioinformatics. 2016 Mar 31;17:145. doi: 10.1186/s12859-016-0995-8.
6
Implementation of Genomic Prediction in Lolium perenne (L.) Breeding Populations.黑麦草育种群体中基因组预测的实施
Front Plant Sci. 2016 Feb 12;7:133. doi: 10.3389/fpls.2016.00133. eCollection 2016.
7
Machine learning and data mining in complex genomic data--a review on the lessons learned in Genetic Analysis Workshop 19.复杂基因组数据中的机器学习与数据挖掘——遗传分析研讨会19的经验教训综述
BMC Genet. 2016 Feb 3;17 Suppl 2(Suppl 2):1. doi: 10.1186/s12863-015-0315-8.
8
Parallel classification and feature selection in microarray data using SPRINT.使用SPRINT对微阵列数据进行并行分类和特征选择。
Concurr Comput. 2014 Mar 25;26(4):854-865. doi: 10.1002/cpe.2928.
9
Association between protein signals and type 2 diabetes incidence.蛋白质信号与 2 型糖尿病发病风险的关联。
Acta Diabetol. 2013 Oct;50(5):697-704. doi: 10.1007/s00592-012-0376-3. Epub 2012 Feb 5.
10
On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data.随机森林的随机丛林之旅:一种用于高维数据的随机森林的快速实现。
Bioinformatics. 2010 Jul 15;26(14):1752-8. doi: 10.1093/bioinformatics/btq257. Epub 2010 May 26.