Nakatsuka Nathan, Adler Drew, Jiang Longda, Hartman Austin, Cheng Evan, Klann Eric, Satija Rahul
New York Genome Center; New York, NY 10013.
Center for Genomics and Systems Biology, New York University; New York, NY 10003.
bioRxiv. 2025 Feb 3:2024.10.15.618577. doi: 10.1101/2024.10.15.618577.
We assessed the reproducibility of differentially expressed genes (DEGs) in previously published Alzheimer's (AD), Parkinson's (PD), Schizophrenia (SCZ), and COVID-19 scRNA-seq studies. While transcriptional scores from DEGs of individual PD and COVID-19 datasets had moderate predictive power for case-control status of other datasets (AUC=0.77 and 0.75), genes from individual AD and SCZ datasets had poor predictive power (AUC=0.68 and 0.55). We developed a non-parametric meta-analysis method, SumRank, based on reproducibility of relative differential expression ranks across datasets, and found DEGs with improved predictive power (AUC=0.88, 0.91, 0.78, and 0.62). By multiple other metrics, specificity and sensitivity of these genes were substantially higher than those discovered by dataset merging and inverse variance weighted p-value aggregation methods. The DEGs revealed known and novel biological pathways, and we validate as down-regulated in AD mouse oligodendrocytes. Lastly, we evaluate factors influencing reproducibility of individual studies as a prospective guide for experimental design.
我们评估了先前发表的阿尔茨海默病(AD)、帕金森病(PD)、精神分裂症(SCZ)和新冠肺炎(COVID-19)单细胞RNA测序(scRNA-seq)研究中差异表达基因(DEG)的可重复性。虽然来自单个PD和COVID-19数据集的DEG转录评分对其他数据集的病例对照状态具有中等预测能力(曲线下面积[AUC]=0.77和0.75),但来自单个AD和SCZ数据集的基因预测能力较差(AUC=0.68和0.55)。我们基于跨数据集相对差异表达排名的可重复性开发了一种非参数元分析方法SumRank,并发现具有更高预测能力的DEG(AUC=0.88、0.91、0.78和0.62)。通过多个其他指标,这些基因的特异性和敏感性显著高于通过数据集合并和逆方差加权p值汇总方法发现的基因。这些DEG揭示了已知和新的生物学途径,并且我们验证其在AD小鼠少突胶质细胞中表达下调。最后,我们评估影响个体研究可重复性的因素,作为实验设计的前瞻性指导。