Wolfram-Schauerte Maik, Vogel Thomas, Tuoken Hanati, Fälth Savitski Maria, Simon Eric, Nieselt Kay
Faculty of Science, Department of Computer Science, Eberhard-Karls University Tübingen, Sand 14, D-72076 Tübingen, Baden-Württemberg, Germany.
Computational Innovation, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Str. 65, D-88397 Biberach, Baden-Württemberg, Germany.
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf388.
Tissues, organs, and entire organisms are composed of diverse cell populations, which are characterized by cell-type-specific gene activities. Bulk RNA-seq represents a robust, cost-effective, scalable method to measure gene activity at the bulk tissue level. However, pathomolecular processes lead to divergent changes in tissue composition and cell-type-specific gene deregulations, which cannot be resolved at the tissue bulk level without information on either change in cell-type proportion or expression at the single-cell level. Accordingly, methods have been developed that constrain bulk deconvolution by information from single-cell expression or cell-type proportion. In parallel, convolution methods have been developed to project single-cell expression to bulk tissue level (pseudobulk simulation). In the present review, we provide an overview of existing convolution and deconvolution methods, their interconnectivity, and benchmarking. Our unique approach lies in the joint consideration of both directions in a "holistic transcriptome model." Through analysis of published (de)convolution studies and benchmarks, we identified the reduced availability of suitable datasets and the use of inaccurate convolution-like methods for (de)convolution model assessment and training as key bottlenecks in the field. On that basis, we conclude with a holistic transcriptome model envisioning that a more integral approach to convolution and deconvolution is needed. With our suggestions for a unified framework we aim to spark collaborative efforts to enable major leaps forward in the field of (de)convolution.
组织、器官和整个生物体由多种细胞群体组成,这些细胞群体具有细胞类型特异性的基因活性特征。批量RNA测序是一种强大、经济高效且可扩展的方法,用于在组织整体水平上测量基因活性。然而,病理分子过程会导致组织组成和细胞类型特异性基因失调的不同变化,如果没有细胞类型比例变化或单细胞水平表达的信息,这些变化在组织整体水平上是无法解决的。因此,已经开发出一些方法,通过单细胞表达或细胞类型比例的信息来限制批量反卷积。同时,也开发了卷积方法,将单细胞表达投影到组织整体水平(伪批量模拟)。在本综述中,我们概述了现有的卷积和反卷积方法、它们的相互联系以及基准测试。我们独特的方法在于在“整体转录组模型”中联合考虑两个方向。通过对已发表的(去)卷积研究和基准测试的分析,我们确定了合适数据集的可用性降低以及使用不准确的类似卷积的方法进行(去)卷积模型评估和训练是该领域的关键瓶颈。在此基础上,我们以一个整体转录组模型得出结论,即需要一种更综合的卷积和反卷积方法。通过我们对统一框架的建议,我们旨在激发合作努力,以推动(去)卷积领域取得重大进展。