Department of Statistics, The George Washington University, Washington, District of Columbia, USA.
School of Mathematical Science, University of Science and Technology of China, Hefei, China.
J Comput Biol. 2022 Jul;29(7):634-649. doi: 10.1089/cmb.2021.0597. Epub 2022 May 16.
In a single-cell RNA-seq (scRNA-seq) data set, a high proportion of missing values (or an excessive number of zeroes) are frequently observed. For the related follow-up tasks, such as clustering analysis and differential expression analysis, a data set without missing values is generally required. Many imputation approaches have been proposed for this purpose. Multiple imputation (MI) is a well-established approach to address possible biases in a follow-up analysis result based on one-time imputed data. There is a lack of investigation on this in the analysis of scRNA-seq data. In this study, we have investigated how to efficiently apply the MI approach to the clustering analysis and the differential expression analysis of scRNA-seq data. We proposed an MI procedure for clustering analysis and an MI procedure for differential expression analysis. To demonstrate the improvements achieved by MI in clustering analysis and differential expression analysis of scRNA-seq data, we analyzed three well-known scRNA-seq data sets. scIGANs, an scRNA-seq imputation method based on the generative adversarial networks (GANs), has been recently proposed for scRNA-seq data imputation. Multiple randomly imputed data sets can be conveniently generated by this method. We implemented our MI procedures based on scIGANs. We demonstrated that MI yielded improved performances on the clustering analysis and differential expression analysis results. Our applications to experimental scRNA-seq data illustrated the advantages of MI over one-time imputation of missing values in scRNA-seq data.
在单细胞 RNA 测序 (scRNA-seq) 数据集中,经常会观察到大量缺失值(或过多的零值)。对于相关的后续任务,例如聚类分析和差异表达分析,通常需要一个没有缺失值的数据集。为此,已经提出了许多插补方法。多重插补 (MI) 是一种基于一次性插补数据来解决后续分析结果中可能存在偏差的成熟方法。在 scRNA-seq 数据的分析中,对此缺乏研究。在这项研究中,我们研究了如何有效地将 MI 方法应用于 scRNA-seq 数据的聚类分析和差异表达分析。我们提出了一种用于聚类分析的 MI 过程和一种用于差异表达分析的 MI 过程。为了证明 MI 在 scRNA-seq 数据的聚类分析和差异表达分析中所取得的改进,我们分析了三个著名的 scRNA-seq 数据集。最近提出了一种基于生成对抗网络 (GANs) 的 scRNA-seq 插补方法 scIGANs 用于 scRNA-seq 数据插补。该方法可以方便地生成多个随机插补数据集。我们基于 scIGANs 实现了我们的 MI 过程。我们证明 MI 在聚类分析和差异表达分析结果上产生了改进的性能。我们对实验性 scRNA-seq 数据的应用说明了 MI 在 scRNA-seq 数据中的缺失值一次性插补方面的优势。