长读长片段蛋白质基因组学将疾病相关的剪接定量性状位点与疾病的蛋白质异构体效应器联系起来。
Long-read proteogenomics to connect disease-associated sQTLs to the protein isoform effectors of disease.
作者信息
Abood Abdullah, Mesner Larry D, Jeffery Erin D, Murali Mayank, Lehe Micah, Saquing Jamie, Farber Charles R, Sheynkman Gloria M
出版信息
bioRxiv. 2023 Mar 21:2023.03.17.531557. doi: 10.1101/2023.03.17.531557.
A major fraction of loci identified by genome-wide association studies (GWASs) lead to alterations in alternative splicing, but interpretation of how such alterations impact proteins is hindered by the technical limitations of short-read RNA-seq, which cannot directly link splicing events to full-length transcript or protein isoforms. Long-read RNA-seq represents a powerful tool to define and quantify transcript isoforms, and recently, infer protein isoform existence. Here we present a novel approach that integrates information from GWAS, splicing QTL (sQTL), and PacBio long-read RNA-seq in a disease-relevant model to infer the effects of sQTLs on the ultimate protein isoform products they encode. We demonstrate the utility of our approach using bone mineral density (BMD) GWAS data. We identified 1,863 sQTLs from the Genotype-Tissue Expression (GTEx) project in 732 protein-coding genes which colocalized with BMD associations (H PP ≥ 0.75). We generated deep coverage PacBio long-read RNA-seq data (N=∼22 million full-length reads) on human osteoblasts, identifying 68,326 protein-coding isoforms, of which 17,375 (25%) were novel. By casting the colocalized sQTLs directly onto protein isoforms, we connected 809 sQTLs to 2,029 protein isoforms from 441 genes expressed in osteoblasts. Using these data, we created one of the first proteome-scale resources defining full-length isoforms impacted by colocalized sQTLs. Overall, we found that 74 sQTLs influenced isoforms likely impacted by nonsense mediated decay (NMD) and 190 that potentially resulted in the expression of new protein isoforms. Finally, we identified colocalizing sQTLs in for splice junctions between two mutually exclusive exons, and two different transcript termination sites, making it impossible to interpret without long-read RNA-seq data. siRNA mediated knockdown in osteoblasts showed two isoforms with opposing effects on mineralization. We expect our approach to be widely generalizable across diverse clinical traits and accelerate system-scale analyses of protein isoform activities modulated by GWAS loci.
全基因组关联研究(GWAS)鉴定出的大部分基因座会导致可变剪接改变,但由于短读长RNA测序的技术局限性,阻碍了对这些改变如何影响蛋白质的解读,因为短读长RNA测序无法直接将剪接事件与全长转录本或蛋白质异构体联系起来。长读长RNA测序是定义和定量转录本异构体的有力工具,最近还能推断蛋白质异构体的存在。在此,我们提出一种新方法,在疾病相关模型中整合来自GWAS、剪接QTL(sQTL)和PacBio长读长RNA测序的信息,以推断sQTL对其编码的最终蛋白质异构体产物的影响。我们使用骨密度(BMD)GWAS数据证明了我们方法的实用性。我们从基因型-组织表达(GTEx)项目的732个蛋白质编码基因中鉴定出1863个sQTL,这些基因与BMD关联共定位(H PP≥0.75)。我们在人成骨细胞上生成了深度覆盖的PacBio长读长RNA测序数据(N=约2200万个全长读数),鉴定出68326个蛋白质编码异构体,其中17375个(25%)是新的。通过将共定位的sQTL直接映射到蛋白质异构体上,我们将809个sQTL与成骨细胞中表达的441个基因的2029个蛋白质异构体联系起来。利用这些数据,我们创建了首批蛋白质组规模的资源之一,定义了受共定位sQTL影响的全长异构体。总体而言,我们发现74个sQTL影响可能受无义介导衰变(NMD)影响的异构体,190个sQTL可能导致新蛋白质异构体的表达。最后,我们在两个互斥外显子之间的剪接接头以及两个不同的转录终止位点鉴定出共定位的sQTL,没有长读长RNA测序数据就无法解读。成骨细胞中的siRNA介导的敲低显示两种异构体对矿化有相反的影响。我们期望我们的方法能广泛应用于各种临床特征,并加速对由GWAS基因座调节的蛋白质异构体活性的系统规模分析。
相似文献
Cochrane Database Syst Rev. 2021-4-19
Cochrane Database Syst Rev. 2022-10-4
Cochrane Database Syst Rev. 2020-1-9
Cochrane Database Syst Rev. 2015-7-27
2025-1