Graduate Program in Quantitative and Computational Biosciences, Baylor College of Medicine, Houston, USA.
Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston, USA.
Genome Biol. 2021 Apr 12;22(1):102. doi: 10.1186/s13059-021-02290-6.
Deconvolution analyses have been widely used to track compositional alterations of cell types in gene expression data. Although a large number of novel methods have been developed, due to a lack of understanding of the effects of modeling assumptions and tuning parameters, it is challenging for researchers to select an optimal deconvolution method suitable for the targeted biological conditions.
To systematically reveal the pitfalls and challenges of deconvolution analyses, we investigate the impact of several technical and biological factors including simulation model, quantification unit, component number, weight matrix, and unknown content by constructing three benchmarking frameworks. These frameworks cover comparative analysis of 11 popular deconvolution methods under 1766 conditions.
We provide new insights to researchers for future application, standardization, and development of deconvolution tools on RNA-seq data.
去卷积分析已被广泛用于跟踪基因表达数据中细胞类型的组成变化。尽管已经开发了大量新的方法,但由于缺乏对建模假设和调整参数的影响的理解,研究人员很难选择适合目标生物学条件的最佳去卷积方法。
为了系统地揭示去卷积分析的缺陷和挑战,我们通过构建三个基准框架来研究包括模拟模型、量化单位、组件数量、权重矩阵和未知内容在内的几个技术和生物学因素的影响。这些框架涵盖了在 1766 种条件下对 11 种流行的去卷积方法的比较分析。
我们为未来 RNA-seq 数据去卷积工具的应用、标准化和开发提供了新的见解。