Department of Epidemiology and Public Health, Division of Biostatistics and Bioinformatics, University of Maryland School of Medicine, Baltimore, MD 21201, USA.
Department of Neurosurgery, University of Maryland School of Medicine, Baltimore, MD 21201, USA.
Bioinformatics. 2022 Sep 30;38(19):4530-4536. doi: 10.1093/bioinformatics/btac563.
Cell-type deconvolution of bulk tissue RNA sequencing (RNA-seq) data is an important step toward understanding the variations in cell-type composition among disease conditions. Owing to recent advances in single-cell RNA sequencing (scRNA-seq) and the availability of large amounts of bulk RNA-seq data in disease-relevant tissues, various deconvolution methods have been developed. However, the performance of existing methods heavily relies on the quality of information provided by external data sources, such as the selection of scRNA-seq data as a reference and prior biological information.
We present the Integrated and Robust Deconvolution (InteRD) algorithm to infer cell-type proportions from target bulk RNA-seq data. Owing to the innovative use of penalized regression with a new evaluation criterion for deconvolution, InteRD has three primary advantages. First, it is able to effectively integrate deconvolution results from multiple scRNA-seq datasets. Second, InteRD calibrates estimates from reference-based deconvolution by taking into account extra biological information as priors. Third, the proposed algorithm is robust to inaccurate external information imposed in the deconvolution system. Extensive numerical evaluations and real-data applications demonstrate that InteRD yields more accurate and robust cell-type proportion estimates that agree well with known biology.
The proposed InteRD framework is implemented in R and the package is available at https://cran.r-project.org/web/packages/InteRD/index.html.
Supplementary data are available at Bioinformatics online.
将批量组织 RNA 测序 (RNA-seq) 数据进行细胞类型分解是理解疾病状态下细胞类型组成变化的重要步骤。由于单细胞 RNA 测序 (scRNA-seq) 的最新进展以及在疾病相关组织中大量获得批量 RNA-seq 数据,已经开发了各种去卷积方法。然而,现有方法的性能在很大程度上依赖于外部数据源提供的信息的质量,例如选择 scRNA-seq 数据作为参考和先验生物学信息。
我们提出了集成和稳健去卷积(InteRD)算法,以便从目标批量 RNA-seq 数据中推断细胞类型比例。由于创新性地使用了具有新的去卷积评估标准的惩罚回归,InteRD 具有三个主要优点。首先,它能够有效地整合来自多个 scRNA-seq 数据集的去卷积结果。其次,InteRD 通过将额外的生物学信息作为先验来校准基于参考的去卷积的估计值。第三,所提出的算法对去卷积系统中引入的不准确外部信息具有鲁棒性。广泛的数值评估和实际数据应用表明,InteRD 产生的细胞类型比例估计更准确、更稳健,与已知生物学一致。
所提出的 InteRD 框架在 R 中实现,该软件包可在 https://cran.r-project.org/web/packages/InteRD/index.html 上获得。
补充数据可在 Bioinformatics 在线获得。