Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA.
Department of Statistics, North Carolina State University, Raleigh, North Carolina, USA.
Biom J. 2023 Aug;65(6):e2200029. doi: 10.1002/bimj.202200029. Epub 2023 May 22.
Multivariate heterogeneous responses and heteroskedasticity have attracted increasing attention in recent years. In genome-wide association studies, effective simultaneous modeling of multiple phenotypes would improve statistical power and interpretability. However, a flexible common modeling system for heterogeneous data types can pose computational difficulties. Here we build upon a previous method for multivariate probit estimation using a two-stage composite likelihood that exhibits favorable computational time while retaining attractive parameter estimation properties. We extend this approach to incorporate multivariate responses of heterogeneous data types (binary and continuous), and possible heteroskedasticity. Although the approach has wide applications, it would be particularly useful for genomics, precision medicine, or individual biomedical prediction. Using a genomics example, we explore statistical power and confirm that the approach performs well for hypothesis testing and coverage percentages under a wide variety of settings. The approach has the potential to better leverage genomics data and provide interpretable inference for pleiotropy, in which a locus is associated with multiple traits.
近年来,多元异质响应和异方差性受到了越来越多的关注。在全基因组关联研究中,对多个表型进行有效的同步建模可以提高统计功效和可解释性。然而,对于异构数据类型,灵活的通用建模系统可能会带来计算上的困难。在这里,我们在前一个使用两阶段复合似然的多元概率比估计方法的基础上进行了扩展,该方法在保留吸引人的参数估计特性的同时,具有有利的计算时间。我们将这种方法扩展到包含异质数据类型(二分类和连续)的多元响应和可能的异方差性。虽然该方法具有广泛的应用,但它对于基因组学、精准医学或个体生物医学预测特别有用。我们使用基因组学示例来探索统计功效,并确认该方法在各种设置下进行假设检验和覆盖率百分比时表现良好。该方法有可能更好地利用基因组学数据,并为多效性提供可解释的推断,其中一个基因座与多个特征相关。