Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine.
Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
Bioinformatics. 2019 Mar 15;35(6):1033-1039. doi: 10.1093/bioinformatics/bty709.
Small non-coding RNAs (sncRNAs, <100 nts) are highly abundant RNAs that regulate diverse and often tissue-specific cellular processes by associating with transcription factor complexes or binding to mRNAs. While thousands of sncRNA genes exist in the human genome, no single resource provides searchable, unified annotation, expression and processing information for full sncRNA transcripts and mature RNA products derived from these larger RNAs.
Our goal is to establish a complete catalog of annotation, expression, processing, conservation, tissue-specificity and other biological features for all human sncRNA genes and mature products derived from all major RNA classes. DASHR (Database of small human non-coding RNAs) v2.0 database is the first that integrates human sncRNA gene and mature products profiles obtained from multiple RNA-seq protocols. Altogether, 185 tissues/cell types and sncRNA annotations and >800 curated experiments from ENCODE and GEO/SRA across multiple RNA-seq protocols for both GRCh38/hg38 and GRCh37/hg19 assemblies are integrated in DASHR. Moreover, DASHR is the first to contain both known and novel, previously un-annotated sncRNA loci identified by unsupervised segmentation (13 times more loci with 1 678 800 total). Additionally, DASHR v2.0 adds >3 200 000 annotations for non-small RNA genes and other genomic features (long-noncoding RNAs, mRNAs, promoters, repeats). Furthermore, DASHR v2.0 introduces an enhanced user interface, interactive experiment-by-locus table view, sncRNA locus sorting and filtering by biological features. All annotation and expression information directly downloadable and accessible as UCSC genome browser tracks.
DASHR v2.0 is freely available at https://lisanwanglab.org/DASHRv2.
Supplementary data are available at Bioinformatics online.
小非编码 RNA(sncRNA,<100nt)是高度丰富的 RNA,通过与转录因子复合物结合或与 mRNAs 结合,调节多种且通常是组织特异性的细胞过程。虽然人类基因组中存在数千个 sncRNA 基因,但没有单一资源为所有 sncRNA 转录本和从这些较大的 RNA 衍生的成熟 RNA 产物提供可搜索的、统一的注释、表达和处理信息。
我们的目标是为所有人类 sncRNA 基因和从所有主要 RNA 类衍生的成熟产物建立一个完整的注释、表达、处理、保守性、组织特异性和其他生物学特征的目录。DASHR(小型人类非编码 RNA 数据库)v2.0 数据库是第一个整合来自多个 RNA-seq 方案的人类 sncRNA 基因和成熟产物谱的数据库。总共,185 个组织/细胞类型和 sncRNA 注释以及来自 ENCODE 和 GEO/SRA 的 800 多个经过精心策划的实验,涵盖了两个 GRCh38/hg38 和 GRCh37/hg19 组装的多个 RNA-seq 方案,都集成在 DASHR 中。此外,DASHR 是第一个包含通过无监督分割识别的已知和新的、以前未注释的 sncRNA 基因座的数据库(总共 13 倍的基因座,有 1 678 800 个)。此外,DASHR v2.0 为非小 RNA 基因和其他基因组特征(长非编码 RNA、mRNA、启动子、重复)添加了超过 320 万个注释。此外,DASHR v2.0 引入了增强的用户界面,交互式按实验查看表视图,按生物特征对 sncRNA 基因座进行排序和筛选。所有注释和表达信息均可直接下载并作为 UCSC 基因组浏览器轨迹访问。
DASHR v2.0 可在 https://lisanwanglab.org/DASHRv2 上免费获得。
补充数据可在《生物信息学》在线获取。