Department of Epidemiology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA.
Department of Epidemiology, Public Health College, Harbin Medical University, Harbin, Heilongjiang 150081, China.
Nucleic Acids Res. 2018 Apr 6;46(6):e32. doi: 10.1093/nar/gkx1280.
High-throughput sequencing data are increasingly being made available to the research community for secondary analyses, providing new opportunities for large-scale association studies. However, heterogeneity in target capture and sequencing technologies often introduce strong technological stratification biases that overwhelm subtle signals of association in studies of complex traits. Here, we introduce the Cross-Platform Association Toolkit, XPAT, which provides a suite of tools designed to support and conduct large-scale association studies with heterogeneous sequencing datasets. XPAT includes tools to support cross-platform aware variant calling, quality control filtering, gene-based association testing and rare variant effect size estimation. To evaluate the performance of XPAT, we conducted case-control association studies for three diseases, including 783 breast cancer cases, 272 ovarian cancer cases, 205 Crohn disease cases and 3507 shared controls (including 1722 females) using sequencing data from multiple sources. XPAT greatly reduced Type I error inflation in the case-control analyses, while replicating many previously identified disease-gene associations. We also show that association tests conducted with XPAT using cross-platform data have comparable performance to tests using matched platform data. XPAT enables new association studies that combine existing sequencing datasets to identify genetic loci associated with common diseases and other complex traits.
高通量测序数据越来越多地提供给研究界进行二次分析,为大规模关联研究提供了新的机会。然而,目标捕获和测序技术的异质性常常引入强烈的技术分层偏差,从而淹没了复杂性状研究中关联的细微信号。在这里,我们引入了跨平台关联工具包 XPAT,它提供了一套工具,旨在支持和进行具有异质测序数据集的大规模关联研究。XPAT 包括支持跨平台感知变异调用、质量控制过滤、基于基因的关联测试和罕见变异效应大小估计的工具。为了评估 XPAT 的性能,我们使用来自多个来源的测序数据对三种疾病进行了病例对照关联研究,包括 783 例乳腺癌病例、272 例卵巢癌病例、205 例克罗恩病病例和 3507 例共享对照(包括 1722 名女性)。XPAT 大大降低了病例对照分析中的Ⅰ型错误膨胀,同时复制了许多先前确定的疾病-基因关联。我们还表明,使用 XPAT 进行的跨平台数据关联测试与使用匹配平台数据进行的测试具有相当的性能。XPAT 使新的关联研究能够结合现有的测序数据集,以识别与常见疾病和其他复杂性状相关的遗传位点。