Eck Institute for Global Health, Department of Biological Sciences, University of Notre Dame, IN, Notre Dame, USA.
BMC Genomics. 2022 Mar 5;23(1):180. doi: 10.1186/s12864-021-08281-y.
The cyclical nature of gene expression in the intraerythrocytic development cycle (IDC) of the malaria parasite, Plasmodium falciparum, confounds the accurate detection of specific transcriptional differences, e.g. as provoked by the development of drug resistance. In lab-based studies, P. falciparum cultures are synchronized to remove this confounding factor, but the rapid detection of emerging resistance to artemisinin therapies requires rapid analysis of transcriptomes extracted directly from clinical samples. Here we propose the use of cyclical regression covariates (CRC) to eliminate the major confounding effect of developmentally driven transcriptional changes in clinical samples. We show that elimination of this confounding factor reduces both Type I and Type II errors and demonstrate the effectiveness of this approach using a published dataset of 1043 transcriptomes extracted directly from patient blood samples with different patient clearance times after treatment with artemisinin.
We apply this method to two publicly available datasets and demonstrate its ability to reduce the confounding of differences in transcript levels due to misaligned intraerythrocytic development time. Adjusting the clinical 1043 transcriptomes dataset with CRC results in detection of fewer functional categories than previously reported from the same data set adjusted using other methods. We also detect mostly the same functional categories, but observe fewer genes within these categories. Finally, the CRC method identifies genes in a functional category that was absent from the results when the dataset was adjusted using other methods. Analysis of differential gene expression in the clinical data samples that vary broadly for developmental stage resulted in the detection of far fewer transcripts in fewer functional categories while, at the same time, identifying genes in two functional categories not present in the unadjusted data analysis. These differences are consistent with the expectation that CRC reduces both false positives and false negatives with the largest effect on datasets from samples with greater variance in developmental stage.
Cyclical regression covariates have immediate application to parasite transcriptome sequencing directly from clinical blood samples and to cost-constrained in vitro experiments.
疟原虫(Plasmodium falciparum)在红细胞内发育周期(IDC)中的基因表达呈周期性,这使得准确检测特定的转录差异变得复杂,例如由耐药性的发展引起的差异。在实验室基础的研究中,通过同步化疟原虫培养来消除这种混杂因素,但快速检测到青蒿素疗法的耐药性的出现,需要对直接从临床样本中提取的转录组进行快速分析。在这里,我们提出使用周期性回归协变量(CRC)来消除临床样本中由发育驱动的转录变化的主要混杂效应。我们证明消除这种混杂因素可以减少 I 型和 II 型错误,并使用从接受青蒿素治疗后不同清除时间的患者血液样本中直接提取的 1043 个转录组的已发表数据集证明了这种方法的有效性。
我们将该方法应用于两个公开数据集,并证明它能够减少由于红细胞内发育时间未对准导致的转录水平差异的混杂。使用 CRC 调整临床 1043 个转录组数据集的结果比以前使用其他方法从同一数据集调整后的结果检测到的功能类别更少。我们还检测到了大多数相同的功能类别,但在这些类别中观察到的基因较少。最后,CRC 方法在使用其他方法调整数据集时未检测到的功能类别中识别出基因。对临床数据样本进行广泛的发育阶段差异分析,结果仅在少数功能类别中检测到较少的转录物,同时在未调整数据分析中未检测到的两个功能类别中鉴定出基因。这些差异与 CRC 减少假阳性和假阴性的预期一致,对发育阶段差异较大的样本数据集的影响最大。
周期性回归协变量可立即应用于直接从临床血液样本中进行寄生虫转录组测序,以及成本受限的体外实验。