个体水平数据与高维汇总统计量的综合分析。

Integrative analysis of individual-level data and high-dimensional summary statistics.

机构信息

Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD 20892, USA.

School of Statistics and Data Science, Nankai University, Tianjin 300071, China.

出版信息

Bioinformatics. 2023 Apr 3;39(4). doi: 10.1093/bioinformatics/btad156.

DOI:10.1093/bioinformatics/btad156

PMID:36964712

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10361352/

Abstract

MOTIVATION

Researchers usually conduct statistical analyses based on models built on raw data collected from individual participants (individual-level data). There is a growing interest in enhancing inference efficiency by incorporating aggregated summary information from other sources, such as summary statistics on genetic markers' marginal associations with a given trait generated from genome-wide association studies. However, combining high-dimensional summary data with individual-level data using existing integrative procedures can be challenging due to various numeric issues in optimizing an objective function over a large number of unknown parameters.

RESULTS

We develop a procedure to improve the fitting of a targeted statistical model by leveraging external summary data for more efficient statistical inference (both effect estimation and hypothesis testing). To make this procedure scalable to high-dimensional summary data, we propose a divide-and-conquer strategy by breaking the task into easier parallel jobs, each fitting the targeted model by integrating the individual-level data with a small proportion of summary data. We obtain the final estimates of model parameters by pooling results from multiple fitted models through the minimum distance estimation procedure. We improve the procedure for a general class of additive models commonly encountered in genetic studies. We further expand these two approaches to integrate individual-level and high-dimensional summary data from different study populations. We demonstrate the advantage of the proposed methods through simulations and an application to the study of the effect on pancreatic cancer risk by the polygenic risk score defined by BMI-associated genetic markers.

AVAILABILITY AND IMPLEMENTATION

R package is available at https://github.com/fushengstat/MetaGIM.

摘要

动机

研究人员通常基于从个体参与者（个体水平数据）收集的原始数据构建的模型进行统计分析。人们越来越感兴趣的是通过合并来自其他来源的聚合汇总信息来提高推断效率，例如来自全基因组关联研究的遗传标记与给定性状的边缘关联的汇总统计信息。然而，由于在优化大量未知参数的目标函数时存在各种数值问题，使用现有的综合程序将高维汇总数据与个体水平数据结合起来可能具有挑战性。

结果

我们开发了一种通过利用外部汇总数据来改进目标统计模型拟合的程序，以便更有效地进行统计推断（包括效果估计和假设检验）。为了使该程序能够扩展到高维汇总数据，我们提出了一种分而治之的策略，通过将任务分解为更简单的并行作业，每个作业通过将个体水平数据与一小部分汇总数据集成来拟合目标模型。我们通过最小距离估计程序从多个拟合模型的结果中汇集来获得模型参数的最终估计值。我们改进了用于遗传研究中常见的一般加法模型类的程序。我们进一步扩展了这两种方法，以整合来自不同研究人群的个体水平和高维汇总数据。我们通过模拟和应用于由 BMI 相关遗传标记定义的多基因风险评分对胰腺癌风险的影响的研究，展示了所提出方法的优势。

可用性和实现

R 包可在 https://github.com/fushengstat/MetaGIM 上获得。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

个体水平数据与高维汇总统计量的综合分析。

Integrative analysis of individual-level data and high-dimensional summary statistics.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现

相似文献

引用本文的文献

本文引用的文献

个体水平数据与高维汇总统计量的综合分析。

Integrative analysis of individual-level data and high-dimensional summary statistics.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现

相似文献

引用本文的文献

本文引用的文献