基于回归的快速多性状全基因组 QTL 分析。

Regression based fast multi-trait genome-wide QTL analysis.

机构信息

Bioinformatics Laboratory, Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh.

Institute of Biological Science, University of Rajshahi, Rajshahi, 6205, Bangladesh.

出版信息

J Bioinform Comput Biol. 2021 Feb;19(1):2050044. doi: 10.1142/S0219720020500444. Epub 2021 Jan 20.

DOI:10.1142/S0219720020500444

PMID:33472570

Abstract

Multivariate simple interval mapping (SIM) is one of the most popular approaches for multiple quantitative trait locus (QTL) analysis. Both maximum likelihood (ML) and least squares (LS) multivariate regression (MVR) are widely used methods for multi-trait SIM. ML-based MVR (MVR-ML) is an expectation maximization (EM) algorithm based iterative and complex time-consuming approach. Although the LS-based MVR (MVR-LS) approach is not an iterative process, the calculation of likelihood ratio (LR) statistic in MVR-LS is also a time-consuming complex process. We have introduced a new approach (called FastMtQTL) for multi-trait QTL analysis based on the assumption of multivariate normal distribution of phenotypic observations. Our proposed method can identify almost the same QTL positions as those identified by the existing methods. Moreover, the proposed method takes comparatively less computation time because of the simplicity in the calculation of LR statistic by this method. In the proposed method, LR statistic is calculated only using the sample variance-covariance matrix of phenotypes and the conditional probability of QTL genotype given the marker genotypes. This improvement in computation time is advantageous when the numbers of phenotypes and individuals are larger, and the markers are very dense resulting in a QTL mapping with a bigger dataset.

摘要

多变量简单区间作图（SIM）是多个数量性状基因座（QTL）分析中最流行的方法之一。最大似然（ML）和最小二乘（LS）多变量回归（MVR）都是多性状 SIM 的常用方法。基于 ML 的 MVR（MVR-ML）是一种基于期望最大化（EM）算法的迭代和复杂耗时的方法。尽管基于 LS 的 MVR（MVR-LS）方法不是一个迭代过程，但 MVR-LS 中似然比（LR）统计量的计算也是一个耗时复杂的过程。我们提出了一种新的多性状 QTL 分析方法（称为 FastMtQTL），该方法基于表型观测值的多变量正态分布假设。我们提出的方法可以识别出与现有方法相同的 QTL 位置。此外，由于该方法在计算 LR 统计量时非常简单，因此计算时间相对较短。在提出的方法中，仅使用表型的样本方差-协方差矩阵和给定标记基因型的 QTL 基因型的条件概率来计算 LR 统计量。当表型和个体数量较大，标记非常密集，导致数据集较大的 QTL 映射时，这种计算时间的改进是有利的。