两种机器学习算法在存在协变量情况下于基因关联研究中的应用。

Application of two machine learning algorithms to genetic association studies in the presence of covariates.

作者信息

Nonyane Bareng A S, Foulkes Andrea S

机构信息

Division of Biostatistics and Epidemiology, School of Public Health and Health Sciences, University of Massachusetts Amherst, MA, USA.

出版信息

BMC Genet. 2008 Nov 14;9:71. doi: 10.1186/1471-2156-9-71.

DOI:10.1186/1471-2156-9-71

PMID:19014573

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2620353/

Abstract

BACKGROUND

Population-based investigations aimed at uncovering genotype-trait associations often involve high-dimensional genetic polymorphism data as well as information on multiple environmental and clinical parameters. Machine learning (ML) algorithms offer a straightforward analytic approach for selecting subsets of these inputs that are most predictive of a pre-defined trait. The performance of these algorithms, however, in the presence of covariates is not well characterized.

METHODS AND RESULTS

In this manuscript, we investigate two approaches: Random Forests (RFs) and Multivariate Adaptive Regression Splines (MARS). Through multiple simulation studies, the performance under several underlying models is evaluated. An application to a cohort of HIV-1 infected individuals receiving anti-retroviral therapies is also provided.

CONCLUSION

Consistent with more traditional regression modeling theory, our findings highlight the importance of considering the nature of underlying gene-covariate-trait relationships before applying ML algorithms, particularly when there is potential confounding or effect mediation.

摘要

背景

旨在揭示基因型与性状关联的基于人群的调查通常涉及高维基因多态性数据以及多个环境和临床参数的信息。机器学习（ML）算法提供了一种直接的分析方法，用于选择这些输入中对预定义性状最具预测性的子集。然而，这些算法在存在协变量的情况下的性能尚未得到很好的表征。

方法与结果

在本手稿中，我们研究了两种方法：随机森林（RF）和多元自适应回归样条（MARS）。通过多项模拟研究，评估了几种潜在模型下的性能。还提供了对一组接受抗逆转录病毒疗法的HIV-1感染者的应用。

结论

与更传统的回归建模理论一致，我们的研究结果强调了在应用ML算法之前考虑潜在基因-协变量-性状关系性质的重要性，特别是在存在潜在混杂或效应介导的情况下。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

两种机器学习算法在存在协变量情况下于基因关联研究中的应用。

Application of two machine learning algorithms to genetic association studies in the presence of covariates.

作者信息

机构信息

出版信息

BACKGROUND

METHODS AND RESULTS

CONCLUSION

背景

方法与结果

结论

相似文献

引用本文的文献

本文引用的文献

两种机器学习算法在存在协变量情况下于基因关联研究中的应用。

Application of two machine learning algorithms to genetic association studies in the presence of covariates.

作者信息

机构信息

出版信息

BACKGROUND

METHODS AND RESULTS

CONCLUSION

背景

方法与结果

结论

相似文献

引用本文的文献

本文引用的文献