Department of Biostatistics, St. Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN 38105, USA.
Bioinformatics. 2009 Aug 15;25(16):2013-9. doi: 10.1093/bioinformatics/btp357. Epub 2009 Jun 15.
In some applications, prior biological knowledge can be used to define a specific pattern of association of multiple endpoint variables with a genomic variable that is biologically most interesting. However, to our knowledge, there is no statistical procedure designed to detect specific patterns of association with multiple endpoint variables.
Projection onto the most interesting statistical evidence (PROMISE) is proposed as a general procedure to identify genomic variables that exhibit a specific biologically interesting pattern of association with multiple endpoint variables. Biological knowledge of the endpoint variables is used to define a vector that represents the biologically most interesting values for statistics that characterize the associations of the endpoint variables with a genomic variable. A test statistic is defined as the dot-product of the vector of the observed association statistics and the vector of the most interesting values of the association statistics. By definition, this test statistic is proportional to the length of the projection of the observed vector of correlations onto the vector of most interesting associations. Statistical significance is determined via permutation. In simulation studies and an example application, PROMISE shows greater statistical power to identify genes with the interesting pattern of associations than classical multivariate procedures, individual endpoint analyses or listing genes that have the pattern of interest and are significant in more than one individual endpoint analysis.
Documented R routines are freely available from www.stjuderesearch.org/depts/biostats and will soon be available as a Bioconductor package from www.bioconductor.org.
在某些应用中,可以利用先验生物学知识来定义多个终点变量与最具生物学意义的基因组变量之间关联的特定模式。然而,据我们所知,目前还没有专门设计的统计程序来检测与多个终点变量的特定关联模式。
提出了投影到最有趣的统计证据(Projection onto the most interesting statistical evidence,PROMISE),作为一种识别具有与多个终点变量特定生物学有趣关联模式的基因组变量的通用程序。利用终点变量的生物学知识来定义一个向量,该向量表示关联终点变量与基因组变量的统计特征中最有趣的生物学值。将观测关联统计的向量与关联统计的最有趣值的向量的点积定义为检验统计量。根据定义,该检验统计量与观测相关向量在最有趣关联向量上的投影的长度成正比。通过置换确定统计显著性。在模拟研究和一个示例应用中,与经典的多变量程序、单个终点分析或列出具有感兴趣模式且在多个单个终点分析中具有统计学意义的基因相比,PROMISE 显示出更强的识别具有有趣关联模式的基因的统计能力。
记录的 R 例程可从 www.stjuderesearch.org/depts/biostats 免费获得,并将很快作为 Bioconductor 软件包从 www.bioconductor.org 获得。