school of Artificial Intelligence, Jilin University, Jilin, China.
Suzhou Institute of Biomedical Engineering and Technology Chinese Academy of Sciences, China.
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab125.
Gene-expression profiling can define the cell state and gene-expression pattern of cells at the genetic level in a high-throughput manner. With the development of transcriptome techniques, processing high-dimensional genetic data has become a major challenge in expression profiling. Thanks to the recent widespread use of matrix decomposition methods in bioinformatics, a computational framework based on compressed sensing was adopted to reduce dimensionality. However, compressed sensing requires an optimization strategy to learn the modular dictionaries and activity levels from the low-dimensional random composite measurements to reconstruct the high-dimensional gene-expression data. Considering this, here we introduce and compare four compressed sensing frameworks coming from nature-inspired optimization algorithms (CSCS, ABCCS, BACS and FACS) to improve the quality of the decompression process. Several experiments establish that the three proposed methods outperform benchmark methods on nine different datasets, especially the FACS method. We illustrate therefore, the robustness and convergence of FACS in various aspects; notably, time complexity and parameter analyses highlight properties of our proposed FACS. Furthermore, differential gene-expression analysis, cell-type clustering, gene ontology enrichment and pathology analysis are conducted, which bring novel insights into cell-type identification and characterization mechanisms from different perspectives. All algorithms are written in Python and available at https://github.com/Philyzh8/Nature-inspired-CS.
基因表达谱可以高通量地定义细胞在遗传水平上的状态和基因表达模式。随着转录组技术的发展,处理高维遗传数据已成为表达谱分析中的一个主要挑战。由于最近矩阵分解方法在生物信息学中的广泛应用,一种基于压缩感知的计算框架被采用来降低维度。然而,压缩感知需要一种优化策略,从低维随机复合测量中学习模块字典和活动水平,以重建高维基因表达数据。考虑到这一点,我们在这里介绍并比较了四种来自自然启发式优化算法的压缩感知框架(CSCS、ABCCS、BACS 和 FACS),以提高解压过程的质量。通过多项实验,我们确定了三种提出的方法在九个不同数据集上优于基准方法,特别是 FACS 方法。因此,我们从多个方面说明了 FACS 的稳健性和收敛性;值得注意的是,时间复杂度和参数分析突出了我们提出的 FACS 的特性。此外,还进行了差异基因表达分析、细胞类型聚类、基因本体富集和病理学分析,从不同角度为细胞类型鉴定和特征机制提供了新的见解。所有算法均用 Python 编写,并可在 https://github.com/Philyzh8/Nature-inspired-CS 上获得。