Ljungberg Kajsa, Holmgren Sverker, Carlborg Orjan
Department of Scientific Computing, Information Technology, Uppsala University, Box 337, SE-751 05 Uppsala, Sweden.
J Comput Biol. 2002;9(6):793-804. doi: 10.1089/10665270260518272.
Rapid advances in molecular genetics push the need for efficient data analysis. Advanced algorithms are necessary for extracting all possible information from large experimental data sets. We present a general linear algebra framework for quantitative trait loci (QTL) mapping, using both linear regression and maximum likelihood estimation. The formulation simplifies future comparisons between and theoretical analyses of the methods. We show how the common structure of QTL analysis models can be used to improve the kernel algorithms, drastically reducing the computational effort while retaining the original analysis results. We have evaluated our new algorithms on data sets originating from two large F(2) populations of domestic animals. Using an updating approach, we show that 1-3 orders of magnitude reduction in computational demand can be achieved for matrix factorizations. For interval-mapping/composite-interval-mapping settings using a maximum likelihood model, we also show how to use the original EM algorithm instead of the ECM approximation, significantly improving the convergence and further reducing the computational time. The algorithmic improvements makes it feasible to perform analyses which have previously been deemed impractical or even impossible. For example, using the new algorithms, it is reasonable to perform permutation testing using exhaustive search on populations of 200 individuals using an epistatic two-QTL model.
分子遗传学的快速发展推动了高效数据分析的需求。先进的算法对于从大型实验数据集中提取所有可能的信息是必要的。我们提出了一个用于数量性状基因座(QTL)定位的通用线性代数框架,使用线性回归和最大似然估计。该公式简化了方法之间的未来比较和理论分析。我们展示了如何利用QTL分析模型的共同结构来改进核心算法,在保留原始分析结果的同时大幅减少计算量。我们在来自动物两个大型F(2)群体的数据集上评估了我们的新算法。使用一种更新方法,我们表明矩阵分解的计算需求可以减少1 - 3个数量级。对于使用最大似然模型的区间定位/复合区间定位设置,我们还展示了如何使用原始的期望最大化(EM)算法而不是期望条件最大化(ECM)近似,显著提高收敛性并进一步减少计算时间。算法上的改进使得执行以前被认为不切实际甚至不可能的分析变得可行。例如,使用新算法,使用上位性双QTL模型对200个个体的群体进行穷举搜索来进行置换检验是合理的。