Zhang Ruoyu, Atwal Gurinder S, Lim Wei Keat
Regeneron Pharmaceuticals, Tarrytown, NY 10591, USA.
Patterns (N Y). 2021 Feb 15;2(3):100211. doi: 10.1016/j.patter.2021.100211. eCollection 2021 Mar 12.
With the rapid advancement of single-cell RNA-sequencing (scRNA-seq) technology, many data-preprocessing methods have been proposed to address numerous systematic errors and technical variabilities inherent in this technology. While these methods have been demonstrated to be effective in recovering individual gene expression, the suitability to the inference of gene-gene associations and subsequent gene network reconstruction have not been systemically investigated. In this study, we benchmarked five representative scRNA-seq normalization/imputation methods on Human Cell Atlas bone marrow data with respect to their impacts on inferred gene-gene associations. Our results suggested that a considerable amount of spurious correlations was introduced during the data-preprocessing steps due to oversmoothing of the raw data. We proposed a model-agnostic noise-regularization method that can effectively eliminate the correlation artifacts. The noise-regularized gene-gene correlations were further used to reconstruct a gene co-expression network and successfully revealed several known immune cell modules.
随着单细胞RNA测序(scRNA-seq)技术的迅速发展,人们提出了许多数据预处理方法来解决该技术中固有的大量系统误差和技术变异性。虽然这些方法已被证明在恢复个体基因表达方面是有效的,但它们对基因-基因关联推断和后续基因网络重建的适用性尚未得到系统研究。在本研究中,我们针对五种具有代表性的scRNA-seq标准化/插补方法对人类细胞图谱骨髓数据推断基因-基因关联的影响进行了基准测试。我们的结果表明,由于原始数据过度平滑,在数据预处理步骤中引入了大量虚假相关性。我们提出了一种模型无关的噪声正则化方法,该方法可以有效消除相关性伪影。经噪声正则化的基因-基因相关性被进一步用于重建基因共表达网络,并成功揭示了几个已知的免疫细胞模块。