Department of Epidemiology and Biostatistics, University of California, San Francisco, California 94143-0560, USA.
Genetics. 2011 Sep;189(1):305-16. doi: 10.1534/genetics.111.129221. Epub 2011 Jul 29.
In genetic studies, many interesting traits, including growth curves and skeletal shape, have temporal or spatial structure. They are better treated as curves or function-valued traits. Identification of genetic loci contributing to such traits is facilitated by specialized methods that explicitly address the function-valued nature of the data. Current methods for mapping function-valued traits are mostly likelihood-based, requiring specification of the distribution and error structure. However, such specification is difficult or impractical in many scenarios. We propose a general functional regression approach based on estimating equations that is robust to misspecification of the covariance structure. Estimation is based on a two-step least-squares algorithm, which is fast and applicable even when the number of time points exceeds the number of samples. It is also flexible due to a general linear functional model; changing the number of covariates does not necessitate a new set of formulas and programs. In addition, many meaningful extensions are straightforward. For example, we can accommodate incomplete genotype data, and the algorithm can be trivially parallelized. The framework is an attractive alternative to likelihood-based methods when the covariance structure of the data is not known. It provides a good compromise between model simplicity, statistical efficiency, and computational speed. We illustrate our method and its advantages using circadian mouse behavioral data.
在遗传研究中,许多有趣的特征,包括生长曲线和骨骼形状,都具有时间或空间结构。它们最好被视为曲线或函数值特征。专门的方法可以促进识别导致这些特征的遗传基因座,这些方法明确考虑了数据的函数值性质。目前用于绘制函数值特征的方法大多基于似然,需要指定分布和误差结构。然而,在许多情况下,这种指定是困难或不切实际的。我们提出了一种基于估计方程的通用功能回归方法,该方法对协方差结构的指定具有稳健性。估计基于两步最小二乘算法,即使时间点的数量超过样本数量,它也很快且适用。由于具有通用线性函数模型,它也很灵活;改变协变量的数量不需要新的公式和程序集。此外,许多有意义的扩展很简单。例如,我们可以适应不完全的基因型数据,并且算法可以轻松地并行化。当数据的协方差结构未知时,该框架是似然方法的一个有吸引力的替代方案。它在模型简单性、统计效率和计算速度之间提供了很好的折衷。我们使用昼夜节律小鼠行为数据来说明我们的方法及其优势。