Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA; Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA.
Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA; Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22908, USA.
Am J Hum Genet. 2024 Sep 5;111(9):1914-1931. doi: 10.1016/j.ajhg.2024.07.003. Epub 2024 Jul 29.
A major fraction of loci identified by genome-wide association studies (GWASs) mediate alternative splicing, but mechanistic interpretation is hindered by the technical limitations of short-read RNA sequencing (RNA-seq), which cannot directly link splicing events to full-length protein isoforms. Long-read RNA-seq represents a powerful tool to characterize transcript isoforms, and recently, infer protein isoform existence. Here, we present an approach that integrates information from GWASs, splicing quantitative trait loci (sQTLs), and PacBio long-read RNA-seq in a disease-relevant model to infer the effects of sQTLs on the ultimate protein isoform products they encode. We demonstrate the utility of our approach using bone mineral density (BMD) GWAS data. We identified 1,863 sQTLs from the Genotype-Tissue Expression (GTEx) project in 732 protein-coding genes that colocalized with BMD associations (H4PP ≥ 0.75). We generated PacBio Iso-Seq data (N = ∼22 million full-length reads) on human osteoblasts, identifying 68,326 protein-coding isoforms, of which 17,375 (25%) were unannotated. By casting the sQTLs onto protein isoforms, we connected 809 sQTLs to 2,029 protein isoforms from 441 genes expressed in osteoblasts. Overall, we found that 74 sQTLs influenced isoforms likely impacted by nonsense-mediated decay and 190 that potentially resulted in the expression of unannotated protein isoforms. Finally, we functionally validated colocalizing sQTLs in TPM2, in which siRNA-mediated knockdown in osteoblasts showed two TPM2 isoforms with opposing effects on mineralization but exhibited no effect upon knockdown of the entire gene. Our approach should be to generalize across diverse clinical traits and to provide insights into protein isoform activities modulated by GWAS loci.
全基因组关联研究 (GWAS) 鉴定的大多数基因座介导可变剪接,但由于短读长 RNA 测序 (RNA-seq) 的技术限制,机制解释受到阻碍,因为短读长 RNA-seq 无法直接将剪接事件与全长蛋白质亚型联系起来。长读长 RNA-seq 是一种强大的工具,可用于表征转录物亚型,并最近推断蛋白质亚型的存在。在这里,我们提出了一种方法,该方法将 GWAS、剪接数量性状基因座 (sQTL) 和 PacBio 长读长 RNA-seq 的信息整合到疾病相关模型中,以推断 sQTL 对其编码的最终蛋白质亚型产物的影响。我们使用骨密度 (BMD) GWAS 数据证明了我们方法的实用性。我们从 GTEx 项目中鉴定了 732 个蛋白质编码基因中的 1,863 个 sQTL,这些基因座与 BMD 关联 (H4PP ≥ 0.75) 共定位。我们在人类成骨细胞上生成了 PacBio Iso-Seq 数据 (N = ∼2200 万全长读数),鉴定了 68,326 个蛋白质编码亚型,其中 17,375(25%)未注释。通过将 sQTL 映射到蛋白质亚型上,我们将 441 个基因中表达的 809 个 sQTL 与 2,029 个蛋白质亚型联系起来。总的来说,我们发现 74 个 sQTL 影响了可能受无意义介导的衰变影响的亚型,而 190 个 sQTL 可能导致未注释蛋白质亚型的表达。最后,我们在 TPM2 中对共定位的 sQTL 进行了功能验证,其中 siRNA 介导的成骨细胞敲低显示出两种 TPM2 亚型对矿化有相反的影响,但敲低整个基因则没有影响。我们的方法应该推广到不同的临床特征,并深入了解 GWAS 基因座调节的蛋白质亚型活性。