School of Computer Science, Northwestern Polytechnical University, Chang'an Ave, Changan Qu, Xi'an City, Shaanxi Province, China.
BMC Bioinformatics. 2021 Aug 25;22(Suppl 9):281. doi: 10.1186/s12859-021-04187-4.
It is important to understand the composition of cell type and its proportion in intact tissues, as changes in certain cell types are the underlying cause of disease in humans. Although compositions of cell type and ratios can be obtained by single-cell sequencing, single-cell sequencing is currently expensive and cannot be applied in clinical studies involving a large number of subjects. Therefore, it is useful to apply the bulk RNA-Seq dataset and the single-cell RNA dataset to deconvolute and obtain the cell type composition in the tissue.
By analyzing the existing cell population prediction methods, we found that most of the existing methods need the cell-type-specific gene expression profile as the input of the signature matrix. However, in real applications, it is not always possible to find an available signature matrix. To solve this problem, we proposed a novel method, named DCap, to predict cell abundance. DCap is a deconvolution method based on non-negative least squares. DCap considers the weight resulting from measurement noise of bulk RNA-seq and calculation error of single-cell RNA-seq data, during the calculation process of non-negative least squares and performs the weighted iterative calculation based on least squares. By weighting the bulk tissue gene expression matrix and single-cell gene expression matrix, DCap minimizes the measurement error of bulk RNA-Seq and also reduces errors resulting from differences in the number of expressed genes in the same type of cells in different samples. Evaluation test shows that DCap performs better in cell type abundance prediction than existing methods.
DCap solves the deconvolution problem using weighted non-negative least squares to predict cell type abundance in tissues. DCap has better prediction results and does not need to prepare a signature matrix that gives the cell-type-specific gene expression profile in advance. By using DCap, we can better study the changes in cell proportion in diseased tissues and provide more information on the follow-up treatment of diseases.
了解完整组织中细胞类型的组成及其比例非常重要,因为某些细胞类型的变化是人类疾病的根本原因。虽然可以通过单细胞测序获得细胞类型组成和比例,但单细胞测序目前较为昂贵,并且无法应用于涉及大量研究对象的临床研究中。因此,应用批量 RNA-Seq 数据集和单细胞 RNA 数据集进行去卷积以获得组织中的细胞类型组成是很有用的。
通过分析现有的细胞群体预测方法,我们发现大多数现有的方法需要细胞类型特异性基因表达谱作为特征矩阵的输入。然而,在实际应用中,并不总是能够找到可用的特征矩阵。为了解决这个问题,我们提出了一种新的方法,名为 DCap,用于预测细胞丰度。DCap 是一种基于非负最小二乘法的去卷积方法。DCap 在非负最小二乘的计算过程中考虑了批量 RNA-seq 的测量噪声和单细胞 RNA-seq 数据的计算误差产生的权重,并基于最小二乘法进行加权迭代计算。通过对批量组织基因表达矩阵和单细胞基因表达矩阵进行加权,DCap 最小化了批量 RNA-Seq 的测量误差,同时减少了由于不同样本中相同类型细胞的表达基因数量不同而产生的误差。评估测试表明,DCap 在细胞类型丰度预测方面的性能优于现有的方法。
DCap 使用加权非负最小二乘法解决去卷积问题,以预测组织中的细胞类型丰度。DCap 具有更好的预测结果,并且不需要预先准备给出细胞类型特异性基因表达谱的特征矩阵。通过使用 DCap,我们可以更好地研究疾病组织中细胞比例的变化,并为疾病的后续治疗提供更多信息。