Department of Electrical and Computer Engineering.
TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, TX 77843, USA.
Database (Oxford). 2017 Jan 1;2017. doi: 10.1093/database/bax071.
Although the number of RNA-Seq datasets deposited publicly has increased over the past few years, incomplete annotation of the associated metadata limits their potential use. Because of the importance of RNA splicing in diseases and biological processes, we constructed a database called SFMetaDB by curating datasets related with RNA splicing factors. Our effort focused on the RNA-Seq datasets in which splicing factors were knocked-down, knocked-out or over-expressed, leading to 75 datasets corresponding to 56 splicing factors. These datasets can be used in differential alternative splicing analysis for the identification of the potential targets of these splicing factors and other functional studies. Surprisingly, only ∼15% of all the splicing factors have been studied by loss- or gain-of-function experiments using RNA-Seq. In particular, splicing factors with domains from a few dominant Pfam domain families have not been studied. This suggests a significant gap that needs to be addressed to fully elucidate the splicing regulatory landscape. Indeed, there are already mouse models available for ∼20 of the unstudied splicing factors, and it can be a fruitful research direction to study these splicing factors in vitro and in vivo using RNA-Seq. Database URL:http://sfmetadb.ece.tamu.edu/
尽管过去几年中公开存储的 RNA-Seq 数据集数量有所增加,但相关元数据的不完全注释限制了它们的潜在用途。由于 RNA 剪接在疾病和生物过程中的重要性,我们通过整理与 RNA 剪接因子相关的数据集,构建了一个名为 SFMetaDB 的数据库。我们的工作重点是那些敲低、敲除或过表达剪接因子的 RNA-Seq 数据集,共涉及 56 个剪接因子,得到了 75 个数据集。这些数据集可用于差异剪接分析,以鉴定这些剪接因子的潜在靶标和其他功能研究。令人惊讶的是,仅使用 RNA-Seq 通过功能丧失或获得实验研究了约 15%的所有剪接因子。特别是,来自少数主导 Pfam 结构域家族的剪接因子的结构域尚未被研究。这表明需要解决一个重大差距,以充分阐明剪接调控景观。事实上,已经有大约 20 个未被研究的剪接因子的小鼠模型可用,使用 RNA-Seq 在体外和体内研究这些剪接因子可能是一个富有成效的研究方向。数据库 URL:http://sfmetadb.ece.tamu.edu/