Suppr超能文献

单细胞 ATAC-seq 数据基因集评分算法的基准测试。

Benchmarking Algorithms for Gene Set Scoring of Single-cell ATAC-seq Data.

机构信息

Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China.

Department of Automation, Xiamen University, Xiamen 361005, China.

出版信息

Genomics Proteomics Bioinformatics. 2024 Jul 3;22(2). doi: 10.1093/gpbjnl/qzae014.

Abstract

Gene set scoring (GSS) has been routinely conducted for gene expression analysis of bulk or single-cell RNA sequencing (RNA-seq) data, which helps to decipher single-cell heterogeneity and cell type-specific variability by incorporating prior knowledge from functional gene sets. Single-cell assay for transposase accessible chromatin using sequencing (scATAC-seq) is a powerful technique for interrogating single-cell chromatin-based gene regulation, and genes or gene sets with dynamic regulatory potentials can be regarded as cell type-specific markers as if in single-cell RNA-seq (scRNA-seq). However, there are few GSS tools specifically designed for scATAC-seq, and the applicability and performance of RNA-seq GSS tools on scATAC-seq data remain to be investigated. Here, we systematically benchmarked ten GSS tools, including four bulk RNA-seq tools, five scRNA-seq tools, and one scATAC-seq method. First, using matched scATAC-seq and scRNA-seq datasets, we found that the performance of GSS tools on scATAC-seq data was comparable to that on scRNA-seq, suggesting their applicability to scATAC-seq. Then, the performance of different GSS tools was extensively evaluated using up to ten scATAC-seq datasets. Moreover, we evaluated the impact of gene activity conversion, dropout imputation, and gene set collections on the results of GSS. Results show that dropout imputation can significantly promote the performance of almost all GSS tools, while the impact of gene activity conversion methods or gene set collections on GSS performance is more dependent on GSS tools or datasets. Finally, we provided practical guidelines for choosing appropriate preprocessing methods and GSS tools in different application scenarios.

摘要

基因集评分(GSS)已被常规用于批量或单细胞 RNA 测序(RNA-seq)数据的基因表达分析,通过整合来自功能基因集的先验知识,有助于破译单细胞异质性和细胞类型特异性变异性。使用测序的转座酶可及染色质单细胞分析(scATAC-seq)是一种强大的技术,可以用于研究单细胞基于染色质的基因调控,并且具有动态调节潜力的基因或基因集可以被视为单细胞 RNA-seq(scRNA-seq)中的细胞类型特异性标记。然而,专门为 scATAC-seq 设计的 GSS 工具很少,RNA-seq GSS 工具在 scATAC-seq 数据上的适用性和性能仍有待研究。在这里,我们系统地基准测试了十种 GSS 工具,包括四种批量 RNA-seq 工具、五种 scRNA-seq 工具和一种 scATAC-seq 方法。首先,使用匹配的 scATAC-seq 和 scRNA-seq 数据集,我们发现 GSS 工具在 scATAC-seq 数据上的性能与在 scRNA-seq 上的性能相当,这表明它们适用于 scATAC-seq。然后,我们使用多达十个 scATAC-seq 数据集广泛评估了不同 GSS 工具的性能。此外,我们评估了基因活性转换、缺失值插补和基因集集合对 GSS 结果的影响。结果表明,缺失值插补可以显著提高几乎所有 GSS 工具的性能,而基因活性转换方法或基因集集合对 GSS 性能的影响更取决于 GSS 工具或数据集。最后,我们提供了在不同应用场景中选择适当的预处理方法和 GSS 工具的实用指南。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/efc2/11423854/813d5a6d7709/qzae014f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验