MRC/BHF Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB1 8RN, UK.
Pfizer Worldwide Research & Development, Genome Sciences & Technologies, Cambridge, MA 02142, USA.
Nucleic Acids Res. 2019 Jan 10;47(1):e3. doi: 10.1093/nar/gky837.
Quantitative trait locus (QTL) mapping of molecular phenotypes such as metabolites, lipids and proteins through genome-wide association studies represents a powerful means of highlighting molecular mechanisms relevant to human diseases. However, a major challenge of this approach is to identify the causal gene(s) at the observed QTLs. Here, we present a framework for the 'Prioritization of candidate causal Genes at Molecular QTLs' (ProGeM), which incorporates biological domain-specific annotation data alongside genome annotation data from multiple repositories. We assessed the performance of ProGeM using a reference set of 227 previously reported and extensively curated metabolite QTLs. For 98% of these loci, the expert-curated gene was one of the candidate causal genes prioritized by ProGeM. Benchmarking analyses revealed that 69% of the causal candidates were nearest to the sentinel variant at the investigated molecular QTLs, indicating that genomic proximity is the most reliable indicator of 'true positive' causal genes. In contrast, cis-gene expression QTL data led to three false positive candidate causal gene assignments for every one true positive assignment. We provide evidence that these conclusions also apply to other molecular phenotypes, suggesting that ProGeM is a powerful and versatile tool for annotating molecular QTLs. ProGeM is freely available via GitHub.
通过全基因组关联研究对代谢物、脂质和蛋白质等分子表型进行数量性状位点 (QTL) 映射,代表了突出与人类疾病相关的分子机制的一种有力手段。然而,这种方法的主要挑战是确定在观察到的 QTL 中观察到的因果基因 (s)。在这里,我们提出了一种用于“分子 QTL 中候选因果基因优先级排序”(ProGeM)的框架,该框架将生物特定领域的注释数据与来自多个存储库的基因组注释数据结合在一起。我们使用以前报道的和经过广泛编辑的 227 个代谢物 QTL 的参考数据集来评估 ProGeM 的性能。对于这些基因座中的 98%,专家编辑的基因是 ProGeM 优先考虑的候选因果基因之一。基准测试分析表明,69%的因果候选基因位于研究分子 QTL 的哨兵变体附近,这表明基因组接近度是“真正阳性”因果基因的最可靠指标。相比之下,顺式基因表达 QTL 数据导致每一个阳性分配的三个假阳性候选因果基因分配。我们提供的证据表明,这些结论也适用于其他分子表型,表明 ProGeM 是注释分子 QTL 的强大而通用的工具。ProGeM 可通过 GitHub 免费获得。