Ho Yen-Yi, Parmigiani Giovanni, Louis Thomas A, Cope Leslie M
McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, Maryland 21205, USA.
Biometrics. 2011 Mar;67(1):133-41. doi: 10.1111/j.1541-0420.2010.01440.x.
In 2002, Ker-Chau Li introduced the liquid association measure to characterize three-way interactions between genes, and developed a computationally efficient estimator that can be used to screen gene expression microarray data for such interactions. That study, and others published since then, have established the biological validity of the method, and clearly demonstrated it to be a useful tool for the analysis of genomic data sets. To build on this work, we have sought a parametric family of multivariate distributions with the flexibility to model the full range of trivariate dependencies encompassed by liquid association. Such a model could situate liquid association within a formal inferential theory. In this article, we describe such a family of distributions, a trivariate, conditional normal model having Gaussian univariate marginal distributions, and in fact including the trivariate Gaussian family as a special case. Perhaps the most interesting feature of the distribution is that the parameterization naturally parses the three-way dependence structure into a number of distinct, interpretable components. One of these components is very closely aligned to liquid association, and is developed as a measure we call modified liquid association. We develop two methods for estimating this quantity, and propose statistical tests for the existence of this type of dependence. We evaluate these inferential methods in a set of simulations and illustrate their use in the analysis of publicly available experimental data.
2002年,李克昭引入了液体关联度量来刻画基因间的三向相互作用,并开发了一种计算效率高的估计器,可用于筛选基因表达微阵列数据中的此类相互作用。该研究以及此后发表的其他研究,确立了该方法的生物学有效性,并清楚地证明它是分析基因组数据集的有用工具。为了在此基础上进一步开展工作,我们寻求了一个参数化的多元分布族,它具有灵活性,能够对液体关联所涵盖的全范围三变量依赖性进行建模。这样一个模型可以将液体关联置于一个形式化的推断理论之中。在本文中,我们描述了这样一个分布族,即一个具有高斯单变量边际分布的三变量条件正态模型,实际上它还包括三变量高斯族作为一个特殊情况。也许该分布最有趣的特征是,其参数化自然地将三向依赖结构解析为多个不同的、可解释的成分。其中一个成分与液体关联非常紧密相关,并被开发为一种我们称为修正液体关联的度量。我们开发了两种估计这个量的方法,并提出了关于这种类型依赖性存在性的统计检验。我们在一组模拟中评估了这些推断方法,并说明了它们在分析公开可用实验数据中的应用。