Paisley Brianna M, Liu Yunlong
Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, IN, United States.
Toxicology, Eli Lilly and Company, Indianapolis, IN, United States.
Front Genet. 2021 Oct 26;12:763431. doi: 10.3389/fgene.2021.763431. eCollection 2021.
Single-cell sequencing (scRNA-seq) has enabled researchers to study cellular heterogeneity. Accurate cell type identification is crucial for scRNA-seq analysis to be valid and robust. Marker genes, genes specific for one or a few cell types, can improve cell type classification; however, their specificity varies across species, samples, and cell subtypes. Current marker gene databases lack standardization, cell hierarchy consideration, sample diversity, and/or the flexibility for updates as new data become available. Most of these databases are derived from a single statistical analysis despite many such analyses scattered in the literature to identify marker genes from scRNA-seq data and pure cell populations. An R Shiny web tool called GeneMarkeR was developed for researchers to retrieve marker genes demonstrating cell type specificity across species, methodology and sample types based on a novel algorithm. The web tool facilitates online submission and interfaces with MySQL to ensure updatability. Furthermore, the tool incorporates reactive programming to enable researchers to retrieve standardized public data supporting the marker genes. GeneMarkeR currently hosts over 261,000 rows of standardized marker gene results from 25 studies across 21,012 unique genomic entities and 99 unique cell types mapped to hierarchical ontologies.
单细胞测序(scRNA-seq)使研究人员能够研究细胞异质性。准确的细胞类型识别对于scRNA-seq分析的有效性和稳健性至关重要。标记基因,即特定于一种或几种细胞类型的基因,可以改善细胞类型分类;然而,它们的特异性在不同物种、样本和细胞亚型之间存在差异。当前的标记基因数据库缺乏标准化、细胞层次结构考虑、样本多样性,以及随着新数据可用而进行更新的灵活性。尽管文献中有许多这样的统计分析来从scRNA-seq数据和纯细胞群体中识别标记基因,但这些数据库大多来自单一的统计分析。一个名为GeneMarkeR的R Shiny网络工具被开发出来,供研究人员基于一种新算法检索跨物种、方法和样本类型显示细胞类型特异性的标记基因。该网络工具便于在线提交,并与MySQL接口以确保可更新性。此外,该工具采用响应式编程,使研究人员能够检索支持标记基因的标准化公共数据。GeneMarkeR目前拥有来自25项研究的超过261,000行标准化标记基因结果,涉及21,012个独特的基因组实体和映射到层次本体的99种独特细胞类型。