Suppr超能文献

基于PubMed 摘要的潜在语义索引从微阵列基因集中识别转录因子候选物。

Latent Semantic Indexing of PubMed abstracts for identification of transcription factor candidates from microarray derived gene sets.

机构信息

Bioinformatics Program, University of Memphis, Memphis, TN 38152, USA.

出版信息

BMC Bioinformatics. 2011 Oct 18;12 Suppl 10(Suppl 10):S19. doi: 10.1186/1471-2105-12-S10-S19.

Abstract

BACKGROUND

Identification of transcription factors (TFs) responsible for modulation of differentially expressed genes is a key step in deducing gene regulatory pathways. Most current methods identify TFs by searching for presence of DNA binding motifs in the promoter regions of co-regulated genes. However, this strategy may not always be useful as presence of a motif does not necessarily imply a regulatory role. Conversely, motif presence may not be required for a TF to regulate a set of genes. Therefore, it is imperative to include functional (biochemical and molecular) associations, such as those found in the biomedical literature, into algorithms for identification of putative regulatory TFs that might be explicitly or implicitly linked to the genes under investigation.

RESULTS

In this study, we present a Latent Semantic Indexing (LSI) based text mining approach for identification and ranking of putative regulatory TFs from microarray derived differentially expressed genes (DEGs). Two LSI models were built using different term weighting schemes to devise pair-wise similarities between 21,027 mouse genes annotated in the Entrez Gene repository. Amongst these genes, 433 were designated TFs in the TRANSFAC database. The LSI derived TF-to-gene similarities were used to calculate TF literature enrichment p-values and rank the TFs for a given set of genes. We evaluated our approach using five different publicly available microarray datasets focusing on TFs Rel, Stat6, Ddit3, Stat5 and Nfic. In addition, for each of the datasets, we constructed gold standard TFs known to be functionally relevant to the study in question. Receiver Operating Characteristics (ROC) curves showed that the log-entropy LSI model outperformed the tf-normal LSI model and a benchmark co-occurrence based method for four out of five datasets, as well as motif searching approaches, in identifying putative TFs.

CONCLUSIONS

Our results suggest that our LSI based text mining approach can complement existing approaches used in systems biology research to decipher gene regulatory networks by providing putative lists of ranked TFs that might be explicitly or implicitly associated with sets of DEGs derived from microarray experiments. In addition, unlike motif searching approaches, LSI based approaches can reveal TFs that may indirectly regulate genes.

摘要

背景

鉴定负责调节差异表达基因的转录因子(TFs)是推导出基因调控途径的关键步骤。大多数当前的方法通过在共调控基因的启动子区域中搜索 DNA 结合基序来鉴定 TFs。然而,这种策略并不总是有用的,因为基序的存在并不一定意味着调节作用。相反,对于 TF 来说,调节一组基因可能不需要基序的存在。因此,必须将功能(生化和分子)关联,如在生物医学文献中发现的关联,纳入用于鉴定可能与正在研究的基因显式或隐式相关的推定调节 TF 的算法中。

结果

在这项研究中,我们提出了一种基于潜在语义索引(LSI)的文本挖掘方法,用于从微阵列衍生的差异表达基因(DEGs)中鉴定和排序推定的调节 TF。使用两种不同的术语加权方案构建了两个 LSI 模型,以设计 Entrez Gene 存储库中注释的 21027 种小鼠基因之间的成对相似性。在这些基因中,有 433 种被 TRANSFAC 数据库指定为 TF。LSI 衍生的 TF 与基因的相似性用于计算 TF 文献富集 p 值,并为给定的一组基因对 TF 进行排序。我们使用五个不同的公开可用的微阵列数据集评估了我们的方法,这些数据集集中在 TF Rel、Stat6、Ddit3、Stat5 和 Nfic 上。此外,对于每个数据集,我们构建了已知与所研究问题具有功能相关性的黄金标准 TF。接收者操作特征(ROC)曲线表明,在五个数据集的四个数据集上,log-entropy LSI 模型优于 tf-normal LSI 模型和基于共现的基准方法,以及 motif 搜索方法,在鉴定推定的 TF 方面。

结论

我们的结果表明,我们的基于 LSI 的文本挖掘方法可以补充系统生物学研究中现有的方法,通过提供可能与微阵列实验衍生的差异表达基因集显式或隐式相关的推定 TF 列表,来破译基因调控网络。此外,与 motif 搜索方法不同,LSI 方法可以揭示可能间接调节基因的 TF。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2013/3236841/15717bb4a58d/1471-2105-12-S10-S19-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验