de Langen Pierre, Ballester Benoit
Aix-Marseille Univ, INSERM, TAGC, Marseille, France.
NAR Genom Bioinform. 2024 May 14;6(2):lqae051. doi: 10.1093/nargab/lqae051. eCollection 2024 Jun.
The large diversity of functional genomic assays allows for the characterization of non-coding and coding events at the tissue level or at a single-cell resolution. However, this diversity also leads to protocol differences, widely varying sequencing depths, substantial disparities in sample sizes, and number of features. In this work, we have built a Python package, MUFFIN, which offers a wide variety of tools suitable for a broad range of genomic assays and brings many tools that were missing from the Python ecosystem. First, MUFFIN has specialized tools for the exploration of the non-coding regions of genomes, such as a function to identify consensus peaks in peak-called assays, as well as linking genomic regions to genes and performing Gene Set Enrichment Analyses. MUFFIN also possesses a robust and flexible count table processing pipeline, comprising normalization, count transformation, dimensionality reduction, Differential Expression, and clustering. Our tools were tested on three widely different scRNA-seq, ChIP-seq and ATAC-seq datasets. MUFFIN integrates with the popular Scanpy ecosystem and is available on Conda and at https://github.com/pdelangen/Muffin.
功能基因组分析方法的巨大多样性使得我们能够在组织水平或单细胞分辨率下对非编码和编码事件进行表征。然而,这种多样性也导致了实验方案的差异、测序深度的广泛变化、样本量和特征数量的巨大差异。在这项工作中,我们构建了一个Python包MUFFIN,它提供了适用于广泛基因组分析的各种工具,并带来了许多Python生态系统中所缺少的工具。首先,MUFFIN拥有专门用于探索基因组非编码区域的工具,例如在峰检测分析中识别共有峰的功能,以及将基因组区域与基因联系起来并进行基因集富集分析的功能。MUFFIN还拥有一个强大且灵活的计数表处理流程,包括归一化、计数转换、降维、差异表达分析和聚类。我们的工具在三个差异很大的单细胞RNA测序、染色质免疫沉淀测序和转座酶可及染色质测序数据集上进行了测试。MUFFIN与广受欢迎的Scanpy生态系统集成,可在Conda以及https://github.com/pdelangen/Muffin上获取。