Dorans Elizabeth, Jagadeesh Karthik, Dey Kushal, Price Alkes L
Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
PhD Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA, USA.
Nat Genet. 2025 Jun 12. doi: 10.1038/s41588-025-02220-3.
Methods that analyze single-cell paired RNA sequencing (RNA-seq) and assay for transposase-accessible chromatin using sequencing (ATAC-seq) multiome data have shown promise in linking regulatory elements to genes. However, existing methods exhibit low concordance and do not capture the effects of genomic distance. We propose pgBoost, an integrative modeling framework that trains a non-linear combination of existing linking strategies (including genomic distance) on expression quantitative trait locus (eQTL) data to assign a probabilistic score to each candidate single-nucleotide polymorphism-gene link. pgBoost attained higher enrichment than existing methods for evaluation sets derived from eQTL, activity-by-contact, CRISPR and genome-wide association study (GWAS) data. We further determined that restricting pgBoost to features from a focal cell type improved power to identify links relevant to that cell type. We highlight several examples in which pgBoost linked fine-mapped GWAS variants to experimentally validated or biologically plausible target genes that were not implicated by other methods. In conclusion, a non-linear combination of linking strategies improves power to identify target genes underlying GWAS associations.
分析单细胞配对RNA测序(RNA-seq)以及使用测序法检测转座酶可及染色质(ATAC-seq)多组学数据的方法,在将调控元件与基因联系起来方面显示出了前景。然而,现有方法的一致性较低,且未捕捉到基因组距离的影响。我们提出了pgBoost,这是一个整合建模框架,它在表达数量性状位点(eQTL)数据上训练现有连接策略(包括基因组距离)的非线性组合,以便为每个候选单核苷酸多态性-基因连接分配一个概率分数。对于源自eQTL、接触活性、CRISPR和全基因组关联研究(GWAS)数据的评估集,pgBoost比现有方法获得了更高的富集度。我们进一步确定,将pgBoost限制于来自特定细胞类型的特征,可提高识别与该细胞类型相关连接的能力。我们重点介绍了几个例子,其中pgBoost将精细定位的GWAS变异与其他方法未涉及的经实验验证或生物学上合理的靶基因联系起来。总之,连接策略的非线性组合提高了识别GWAS关联背后靶基因的能力。