Hoffman Gabriel E, Lee Donghoon, Bendl Jaroslav, Fnu Prashant, Hong Aram, Casey Clara, Alvia Marcela, Shao Zhiping, Argyriou Stathis, Therrien Karen, Venkatesh Sanan, Voloudakis Georgios, Haroutunian Vahram, Fullard John F, Roussos Panos
Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
Res Sq. 2023 May 2:rs.3.rs-2705625. doi: 10.21203/rs.3.rs-2705625/v1.
Advances in single-cell and -nucleus transcriptomics have enabled generation of increasingly large-scale datasets from hundreds of subjects and millions of cells. These studies promise to give unprecedented insight into the cell type specific biology of human disease. Yet performing differential expression analyses across subjects remains difficult due to challenges in statistical modeling of these complex studies and scaling analyses to large datasets. Our open-source R package dreamlet (DiseaseNeurogenomics.github.io/dreamlet) uses a pseudobulk approach based on precision-weighted linear mixed models to identify genes differentially expressed with traits across subjects for each cell cluster. Designed for data from large cohorts, dreamlet is substantially faster and uses less memory than existing workflows, while supporting complex statistical models and controlling the false positive rate. We demonstrate computational and statistical performance on published datasets, and a novel dataset of 1.4M single nuclei from postmortem brains of 150 Alzheimer's disease cases and 149 controls.
单细胞和单细胞核转录组学的进展使得能够从数百名受试者和数百万个细胞中生成规模越来越大的数据集。这些研究有望为人类疾病的细胞类型特异性生物学提供前所未有的见解。然而,由于这些复杂研究的统计建模以及将分析扩展到大型数据集存在挑战,跨受试者进行差异表达分析仍然很困难。我们的开源R包dreamlet(DiseaseNeurogenomics.github.io/dreamlet)使用基于精确加权线性混合模型的伪批量方法,为每个细胞簇识别与跨受试者特征差异表达的基因。dreamlet专为来自大型队列的数据设计,比现有工作流程速度快得多且内存使用更少,同时支持复杂的统计模型并控制假阳性率。我们在已发表的数据集以及来自150例阿尔茨海默病病例和149例对照的死后大脑的140万个单细胞核的新数据集中展示了计算和统计性能。