Tian Yijun, Wu Lang, Huang Chang-Ching, Wang Liang
Department of Tumor Biology, Moffitt Cancer Center, 12902 Magnolia Drive, Tampa, FL 33612, United States.
Population Sciences in the Pacific Program, University of Hawai i Cancer Center, University of Hawai i at Mānoa, Honolulu, HI 96813, USA.
bioRxiv. 2024 Jun 21:2024.06.19.599704. doi: 10.1101/2024.06.19.599704.
While genome-wide association studies and expression quantitative trait loci (eQTL) analysis have made significant progress in identifying noncoding variants associated with prostate cancer risk and bulk tissue transcriptome changes, the regulatory effect of these genetic elements on gene expression remains largely unknown. Recent developments in single-cell sequencing have made it possible to perform ATAC-seq and RNA-seq profiling simultaneously to capture functional associations between chromatin accessibility and gene expression. In this study, we tested our hypothesis that this multiome single-cell approach allows for mapping regulatory elements and their target genes at prostate cancer risk loci. We applied a 10X Multiome ATAC + Gene Expression platform to encapsulate Tn5 transposase-tagged nuclei from multiple prostate cell lines for a total of 65,501 high quality single cells from RWPE1, RWPE2, PrEC, BPH1, DU145, PC3, 22Rv1 and LNCaP cell lines. To address data sparsity commonly seen in the single-cell sequencing, we performed targeted sequencing to enrich sequencing data at prostate cancer risk loci involving 2,730 candidate germline variants and 273 associated genes. Although not increasing the number of captured cells, the targeted multiome data did improve eQTL gene expression abundance by about 20% and chromatin accessibility abundance by about 5%. Based on this multiomic profiling, we further associated RNA expression alterations with chromatin accessibility of germline variants at single cell levels. Cross validation analysis showed high overlaps between the multiome associations and the bulk eQTL findings from GTEx prostate cohort. We found that about 20% of GTEx eQTLs were covered within the significant multiome associations (-value ≤ 0.05, gene abundance percentage ≥ 5%), and roughly 10% of the multiome associations could be identified by significant GTEx eQTLs. We also analyzed accessible regions with available heterozygous SNP reads and observed more frequent association in genomic regions with allelically accessible variants ( = 0.0055). Among these findings were previously reported regulatory variants including rs60464856-multiome -value = 0.0099 in BPH1) and rs7247241-multiome -value = 0.0002- 0.0004 in 22Rv1). We also functionally validated a new regulatory SNP and its target gene rs2474694-multiome -value = 0.00956 in BPH1 and 0.00625 in DU145) by reporter assay and SILAC proteomics sequencing. Taken together, our data demonstrated the feasibility of the multiome single-cell approach for identifying regulatory SNPs and their regulated genes.
虽然全基因组关联研究和表达定量性状位点(eQTL)分析在识别与前列腺癌风险和大块组织转录组变化相关的非编码变异方面取得了重大进展,但这些遗传元件对基因表达的调控作用在很大程度上仍不清楚。单细胞测序的最新进展使得同时进行ATAC-seq和RNA-seq分析成为可能,以捕获染色质可及性与基因表达之间的功能关联。在本研究中,我们检验了我们的假设,即这种多组学单细胞方法能够在前列腺癌风险位点绘制调控元件及其靶基因。我们应用10X Multiome ATAC +基因表达平台封装来自多个前列腺细胞系的Tn5转座酶标记的细胞核,共获得来自RWPE1、RWPE2、PrEC、BPH1、DU145、PC3、22Rv1和LNCaP细胞系的65,501个高质量单细胞。为了解决单细胞测序中常见的数据稀疏问题,我们进行了靶向测序,以富集涉及2730个候选种系变异和273个相关基因的前列腺癌风险位点的测序数据。虽然没有增加捕获细胞的数量,但靶向多组学数据确实使eQTL基因表达丰度提高了约20%,染色质可及性丰度提高了约5%。基于这种多组学分析,我们在单细胞水平上进一步将RNA表达改变与种系变异的染色质可及性相关联。交叉验证分析表明,多组学关联与来自GTEx前列腺队列的大块eQTL结果之间有高度重叠。我们发现,约20%的GTEx eQTL被涵盖在显著的多组学关联中(-值≤0.05,基因丰度百分比≥5%),大约10%的多组学关联可以通过显著的GTEx eQTL识别出来。我们还分析了具有可用杂合SNP读数的可及区域,并观察到在具有等位基因可及变异的基因组区域中更频繁的关联(=0.0055)。在这些发现中,有先前报道的调控变异,包括BPH1中的rs60464856 -多组学-值=0.0099)和22Rv1中的rs7247241 -多组学-值=0.0002 - 0.0004)。我们还通过报告基因检测和SILAC蛋白质组学测序在功能上验证了一个新的调控SNP及其靶基因rs2474694 -多组学-值=BPH1中为0.00956,DU145中为0.00625)。综上所述,我们的数据证明了多组学单细胞方法用于识别调控SNP及其调控基因的可行性。