Suppr超能文献

心脏转录因子数据库(CardioTF),一个用于解析心脏系统转录调控网络的数据库。

CardioTF, a database of deconstructing transcriptional circuits in the heart system.

作者信息

Zhen Yisong

机构信息

State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College , Beijing , China.

出版信息

PeerJ. 2016 Aug 23;4:e2339. doi: 10.7717/peerj.2339. eCollection 2016.

Abstract

BACKGROUND

Information on cardiovascular gene transcription is fragmented and far behind the present requirements of the systems biology field. To create a comprehensive source of data for cardiovascular gene regulation and to facilitate a deeper understanding of genomic data, the CardioTF database was constructed. The purpose of this database is to collate information on cardiovascular transcription factors (TFs), position weight matrices (PWMs), and enhancer sequences discovered using the ChIP-seq method.

METHODS

The Naïve-Bayes algorithm was used to classify literature and identify all PubMed abstracts on cardiovascular development. The natural language learning tool GNAT was then used to identify corresponding gene names embedded within these abstracts. Local Perl scripts were used to integrate and dump data from public databases into the MariaDB management system (MySQL). In-house R scripts were written to analyze and visualize the results.

RESULTS

Known cardiovascular TFs from humans and human homologs from fly, Ciona, zebrafish, frog, chicken, and mouse were identified and deposited in the database. PWMs from Jaspar, hPDI, and UniPROBE databases were deposited in the database and can be retrieved using their corresponding TF names. Gene enhancer regions from various sources of ChIP-seq data were deposited into the database and were able to be visualized by graphical output. Besides biocuration, mouse homologs of the 81 core cardiac TFs were selected using a Naïve-Bayes approach and then by intersecting four independent data sources: RNA profiling, expert annotation, PubMed abstracts and phenotype.

DISCUSSION

The CardioTF database can be used as a portal to construct transcriptional network of cardiac development.

AVAILABILITY AND IMPLEMENTATION

Database URL: http://www.cardiosignal.org/database/cardiotf.html.

摘要

背景

心血管基因转录的信息零散,远远落后于系统生物学领域目前的要求。为了创建心血管基因调控的全面数据源,并促进对基因组数据的更深入理解,构建了心脏转录因子(CardioTF)数据库。该数据库的目的是整理有关心血管转录因子(TFs)、位置权重矩阵(PWMs)以及使用ChIP-seq方法发现的增强子序列的信息。

方法

使用朴素贝叶斯算法对文献进行分类,并识别所有关于心血管发育的PubMed摘要。然后使用自然语言学习工具GNAT识别这些摘要中嵌入的相应基因名称。使用本地Perl脚本将来自公共数据库的数据整合并转储到MariaDB管理系统(MySQL)中。编写内部R脚本以分析和可视化结果。

结果

识别出人类已知的心血管TFs以及果蝇(fly)、海鞘(Ciona)、斑马鱼、青蛙、鸡和小鼠的人类同源物,并将其存入数据库。来自Jaspar、hPDI和UniPROBE数据库的PWMs存入数据库,可使用其相应的TF名称进行检索。来自各种ChIP-seq数据源的基因增强子区域存入数据库,并能够通过图形输出进行可视化。除了生物编目外,还使用朴素贝叶斯方法,然后通过交叉四个独立数据源(RNA分析、专家注释、PubMed摘要和表型)选择了81个核心心脏TFs的小鼠同源物。

讨论

CardioTF数据库可作为构建心脏发育转录网络的门户。

可用性和实现方式

数据库网址:http://www.cardiosignal.org/database/cardiotf.html。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/342c/5012272/5088c475d3bb/peerj-04-2339-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验