Suppr超能文献

基因流行病学家面临的挑战:如何分析与复杂疾病相关的大量单核苷酸多态性。

The challenge for genetic epidemiologists: how to analyze large numbers of SNPs in relation to complex diseases.

作者信息

Heidema A Geert, Boer Jolanda M A, Nagelkerke Nico, Mariman Edwin C M, van der A Daphne L, Feskens Edith J M

机构信息

Centre for Nutrition and Health, National Institute for Public Health and the Environment, PO Box 1 3720 BA Bilthoven, The Netherlands.

出版信息

BMC Genet. 2006 Apr 21;7:23. doi: 10.1186/1471-2156-7-23.

Abstract

Genetic epidemiologists have taken the challenge to identify genetic polymorphisms involved in the development of diseases. Many have collected data on large numbers of genetic markers but are not familiar with available methods to assess their association with complex diseases. Statistical methods have been developed for analyzing the relation between large numbers of genetic and environmental predictors to disease or disease-related variables in genetic association studies. In this commentary we discuss logistic regression analysis, neural networks, including the parameter decreasing method (PDM) and genetic programming optimized neural networks (GPNN) and several non-parametric methods, which include the set association approach, combinatorial partitioning method (CPM), restricted partitioning method (RPM), multifactor dimensionality reduction (MDR) method and the random forests approach. The relative strengths and weaknesses of these methods are highlighted. Logistic regression and neural networks can handle only a limited number of predictor variables, depending on the number of observations in the dataset. Therefore, they are less useful than the non-parametric methods to approach association studies with large numbers of predictor variables. GPNN on the other hand may be a useful approach to select and model important predictors, but its performance to select the important effects in the presence of large numbers of predictors needs to be examined. Both the set association approach and random forests approach are able to handle a large number of predictors and are useful in reducing these predictors to a subset of predictors with an important contribution to disease. The combinatorial methods give more insight in combination patterns for sets of genetic and/or environmental predictor variables that may be related to the outcome variable. As the non-parametric methods have different strengths and weaknesses we conclude that to approach genetic association studies using the case-control design, the application of a combination of several methods, including the set association approach, MDR and the random forests approach, will likely be a useful strategy to find the important genes and interaction patterns involved in complex diseases.

摘要

遗传流行病学家面临着识别与疾病发生相关的基因多态性的挑战。许多人收集了大量基因标记的数据,但不熟悉评估它们与复杂疾病关联的可用方法。在遗传关联研究中,已经开发出统计方法来分析大量基因和环境预测因素与疾病或疾病相关变量之间的关系。在这篇评论中,我们讨论逻辑回归分析、神经网络,包括参数递减法(PDM)和遗传编程优化神经网络(GPNN)以及几种非参数方法,其中包括集合关联法、组合划分法(CPM)、受限划分法(RPM)、多因素降维(MDR)法和随机森林法。强调了这些方法的相对优缺点。逻辑回归和神经网络只能处理有限数量的预测变量,这取决于数据集中的观察数量。因此,在处理大量预测变量的关联研究时,它们不如非参数方法有用。另一方面,GPNN可能是选择和建模重要预测因素的有用方法,但其在大量预测因素存在的情况下选择重要效应的性能需要检验。集合关联法和随机森林法都能够处理大量预测因素,并且在将这些预测因素减少到对疾病有重要贡献的预测因素子集方面很有用。组合方法能更深入地了解可能与结果变量相关的基因和/或环境预测变量集的组合模式。由于非参数方法有不同的优缺点,我们得出结论,对于采用病例对照设计的遗传关联研究,应用几种方法的组合,包括集合关联法、MDR法和随机森林法,可能是找到参与复杂疾病的重要基因和相互作用模式的有用策略。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76af/1479365/f797cc1a8225/1471-2156-7-23-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验