Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, USA.
Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, Maryland, USA.
mSystems. 2022 Aug 30;7(4):e0025822. doi: 10.1128/msystems.00258-22. Epub 2022 Jul 5.
Malaria symptoms are caused by the development of the parasites within the blood of an infected host. Bulk RNA sequencing (RNA-seq) of infected blood can reveal interactions between parasites and the host immune system during an infection, but because multiple developmental stages with distinct transcriptional profiles are concurrently present in infected blood, it is necessary to correct such analyses for differences in cell composition among samples. Gene expression deconvolution is a statistical approach that has been developed for inferring the cell composition of complex tissues characterized by bulk RNA-seq using gene expression profiles from reference cell types. Here, we describe the evaluation of a species-agnostic reference data set that can be used for efficient and accurate gene expression deconvolution of bulk RNA-seq data generated from any species and for correct gene expression analyses for biases caused by differences in stage composition among samples. Differences in cell type proportions among samples can introduce artifacts in gene expression analyses and mask genuine differences in gene regulation. Gene expression deconvolution allows estimation of the proportion of each cell type present in one sample directly from bulk RNA sequencing data, but this approach requires a reference data set with the signature profile of each cell type. Here, we evaluate the suitability of a rodent malaria parasite gene expression data set for estimating the proportions of each parasite developmental stage present in bulk RNA sequencing data generated from blood-stage infections with the human parasites Plasmodium falciparum and Plasmodium vivax. These analyses provide a species-agnostic approach for reliably estimating stage proportions in infected human blood and correcting subsequent gene expression analyses for these variations.
疟疾症状是由感染宿主血液中的寄生虫发育引起的。对感染血液进行批量 RNA 测序 (RNA-seq) 可以揭示寄生虫与宿主免疫系统在感染过程中的相互作用,但由于感染血液中同时存在多个具有不同转录特征的发育阶段,因此有必要针对样品中细胞组成的差异对这些分析进行校正。基因表达去卷积是一种统计方法,它是为了从参考细胞类型的基因表达谱中推断出具有批量 RNA-seq 特征的复杂组织的细胞组成而开发的。在这里,我们描述了一种无物种特异性参考数据集的评估,该数据集可用于从任何物种生成的批量 RNA-seq 数据中进行高效且准确的基因表达去卷积,并可用于纠正因样品中阶段组成差异引起的基因表达分析偏差。 样品中细胞类型比例的差异会在基因表达分析中引入伪影,并掩盖基因调控的真正差异。基因表达去卷积允许直接从批量 RNA 测序数据估计一个样本中每个细胞类型的比例,但这种方法需要一个具有每个细胞类型特征谱的参考数据集。在这里,我们评估了一种啮齿动物疟原虫基因表达数据集对于估计来自人源寄生虫恶性疟原虫和间日疟原虫的血液期感染的批量 RNA 测序数据中每个寄生虫发育阶段的比例的适用性。这些分析为可靠估计感染人类血液中的阶段比例提供了一种无物种特异性的方法,并纠正了随后对这些变化的基因表达分析。