Gural Brian, Kirkland Logan, Hockett Abbey, Sandroni Peyton, Zhang Jiandong, Rosa-Garrido Manuel, Swift Samantha K, Chapski Douglas, Flinn Michael A, O'Meara Caitlin C, Vondriska Thomas M, Patterson Michaela, Jensen Brian C, Rau Christoph D
Department of Genetics and Computational Medicine Program, UNC School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.
McAllister Heart Institute, UNC School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.
bioRxiv. 2024 Aug 10:2024.08.09.607400. doi: 10.1101/2024.08.09.607400.
Recent advances in single cell sequencing have led to an increased focus on the role of cell-type composition in phenotypic presentation and disease progression. Cell-type composition research in the heart is challenging due to large, frequently multinucleated cardiomyocytes that preclude most single cell approaches from obtaining accurate measurements of cell composition. Our studies reveal that ignoring cell type composition when calculating differentially expressed genes (DEGs) can have significant consequences. For example, a relatively small change in cell abundance of only 10% can result in over 25% of DEGs being false positives.
We have implemented an algorithmic approach that uses snRNAseq datasets as a reference to accurately calculate cell type compositions from bulk RNAseq datasets through robust data cleaning, gene selection, and multi-sample cross-subject and cross-cell-type deconvolution. We applied our approach to cardiomyocyte-specific α1A adrenergic receptor (CM-α1A-AR) knockout mice. 8-12 week-old mice (either WT or CM-α1A-KO) were subjected to permanent left coronary artery (LCA) ligation or sham surgery (n=4 per group). Transcriptomes from the infarct border zones were collected 3 days later and analyzed using our algorithm to determine cell-type abundances, corrected differential expression calculations using DESeq2, and validated these findings using RNAscope.
Uncorrected DEGs for the CM-α1A-KO X LCA interaction term featured many cell-type specific genes such as (fibroblasts) and (cardiomyocytes) and overall GO enrichment for terms pertaining to cardiomyocyte differentiation (P=3.1E-4). Using our algorithm, we observe a striking loss of cardiomyocytes and gain in fibroblasts in the α1A-KO + LCA mice that was not recapitulated in WT + LCA animals, although we did observe a similar increase in macrophage abundance in both conditions. This recapitulates prior results that showed a much more severe heart failure phenotype in CM-α1A-KO + LCA mice. Following correction for cell-type, our DEGs now highlight a novel set of genes enriched for GO terms such as cardiac contraction (P=3.7E-5) and actin filament organization (P=6.3E-5).
Our algorithm identifies and corrects for cell-type abundance in bulk RNAseq datasets opening new avenues for research on novel genes and pathways as well as an improved understanding of the role of cardiac cell types in cardiovascular disease.
单细胞测序的最新进展使人们更加关注细胞类型组成在表型表现和疾病进展中的作用。心脏中的细胞类型组成研究具有挑战性,因为心肌细胞通常体积较大且多核,这使得大多数单细胞方法无法准确测量细胞组成。我们的研究表明,在计算差异表达基因(DEG)时忽略细胞类型组成可能会产生重大后果。例如,细胞丰度仅10%的相对较小变化可能导致超过25%的DEG为假阳性。
我们实施了一种算法方法,该方法使用单细胞核RNA测序(snRNAseq)数据集作为参考,通过强大的数据清理、基因选择以及多样本跨个体和跨细胞类型反卷积,从批量RNA测序(bulk RNAseq)数据集中准确计算细胞类型组成。我们将我们的方法应用于心肌细胞特异性α1A肾上腺素能受体(CM-α1A-AR)基因敲除小鼠。对8-12周龄的小鼠(野生型或CM-α1A基因敲除型)进行永久性左冠状动脉(LCA)结扎或假手术(每组n = 4)。3天后收集梗死边界区的转录组,并使用我们的算法进行分析,以确定细胞类型丰度,使用DESeq2校正差异表达计算,并使用RNAscope验证这些发现。
CM-α1A基因敲除×LCA相互作用项的未校正DEG中有许多细胞类型特异性基因,如(成纤维细胞)和(心肌细胞),并且与心肌细胞分化相关的术语总体基因本体(GO)富集(P = 3.1E - 4)。使用我们的算法,我们观察到α1A基因敲除 + LCA小鼠中心肌细胞显著减少而成纤维细胞增加,野生型 + LCA动物中未出现这种情况,尽管我们在两种情况下都观察到巨噬细胞丰度有类似增加。这重现了先前的结果,即CM-α1A基因敲除 + LCA小鼠的心力衰竭表型更为严重。在对细胞类型进行校正后,我们的DEG现在突出了一组新的基因,这些基因在诸如心脏收缩(P = 3.7E - 5)和肌动蛋白丝组织(P = 6.3E - 5)等GO术语中富集。
我们的算法识别并校正了批量RNAseq数据集中的细胞类型丰度,为研究新基因和新途径以及更好地理解心脏细胞类型在心血管疾病中的作用开辟了新途径。