一种用于全基因组关联分析和预测具有混合遗传模式性状的近端 LAVA 方法。

A proximal LAVA method for genome-wide association and prediction of traits with mixed inheritance patterns.

机构信息

Research Unit of Mathematical Sciences, University of Oulu, Oulu, Finland.

出版信息

BMC Bioinformatics. 2021 Oct 26;22(1):523. doi: 10.1186/s12859-021-04436-6.

DOI:10.1186/s12859-021-04436-6

PMID:34702175

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8547073/

Abstract

BACKGROUND

The genetic basis of phenotypic traits is highly variable and usually divided into mono-, oligo- and polygenic inheritance classes. Relatively few traits are known to be monogenic or oligogeneic. The majority of traits are considered to have a polygenic background. To what extent there are mixtures between these classes is unknown. The rapid advancement of genomic techniques makes it possible to directly map large amounts of genomic markers (GWAS) and predict unknown phenotypes (GWP). Most of the multi-marker methods for GWAS and GWP falls into one of two regularization frameworks. The first framework is based on [Formula: see text]-norm regularization (e.g. the LASSO) and is suitable for mono- and oligogenic traits, whereas the second framework regularize with the [Formula: see text]-norm (e.g. ridge regression; RR) and thereby is favourable for polygenic traits. A general framework for mixed inheritance is lacking.

RESULTS

We have developed a proximal operator algorithm based on the recent LAVA regularization method that jointly performs [Formula: see text]- and [Formula: see text]-norm regularization. The algorithm is built on the alternating direction method of multipliers and proximal translation mapping (LAVA ADMM). When evaluated on the simulated QTLMAS2010 data, it is shown that the LAVA ADMM together with Bayesian optimization of the regularization parameters provides an efficient approach with lower test prediction mean-squared-error (65.89) than the LASSO (66.11), Ridge regression (83.41) and Elastic net (66.11). For the real pig data the test MSE of the LAVA ADMM is 0.850 compared to the LASSO, RR and EN with 0.875, 0.853 and 0.853, respectively.

CONCLUSIONS

This study presents the LAVA ADMM that is capable of joint modelling of monogenic major genetic effects and polygenic minor genetic effects which can be used for both genome-wide assoiciation and prediction purposes. The statistical evaluations based on both simulated and real pig data set shows that the LAVA ADMM has better prediction properies than the LASSO, RR and EN. Julia code for the LAVA ADMM is available at: https://github.com/patwa67/LAVAADMM .

摘要

背景

表型特征的遗传基础高度可变，通常分为单基因、寡基因和多基因遗传类别。相对较少的特征被认为是单基因或寡基因的。大多数特征被认为具有多基因背景。这些类别之间存在混合的程度尚不清楚。基因组技术的快速发展使得直接映射大量基因组标记（GWAS）和预测未知表型（GWP）成为可能。GWAS 和 GWP 的大多数多标记方法都属于两种正则化框架之一。第一个框架基于[公式：见文本]-范数正则化（例如 LASSO），适用于单基因和寡基因特征，而第二个框架则用[公式：见文本]-范数正则化（例如岭回归；RR），因此有利于多基因特征。缺乏混合遗传的通用框架。

结果

我们基于最近的 LAVA 正则化方法开发了一种基于近端算子的算法，该算法联合执行[公式：见文本]-和[公式：见文本]-范数正则化。该算法建立在交替方向乘子法和近端平移映射（LAVA ADMM）上。在模拟的 QTLMAS2010 数据上进行评估时，结果表明 LAVA ADMM 与正则化参数的贝叶斯优化相结合提供了一种有效的方法，其测试预测均方误差（65.89）低于 LASSO（66.11）、岭回归（83.41）和弹性网（66.11）。对于真实的猪数据，LAVA ADMM 的测试 MSE 为 0.850，而 LASSO、RR 和 EN 的测试 MSE 分别为 0.875、0.853 和 0.853。

结论

本研究提出了 LAVA ADMM，能够联合建模单基因主要遗传效应和多基因次要遗传效应，可用于全基因组关联和预测目的。基于模拟和真实猪数据集的统计评估表明，LAVA ADMM 具有比 LASSO、RR 和 EN 更好的预测特性。LAVA ADMM 的 Julia 代码可在以下网址获得：https://github.com/patwa67/LAVAADMM。