Suppr超能文献

基于 PubMed 摘要潜在语义索引的基因集功能内聚性。

Functional cohesion of gene sets determined by latent semantic indexing of PubMed abstracts.

机构信息

Bioinformatics Program, University of Memphis, Memphis, Tennessee, United States of America.

出版信息

PLoS One. 2011 Apr 14;6(4):e18851. doi: 10.1371/journal.pone.0018851.

Abstract

UNLABELLED

High-throughput genomic technologies enable researchers to identify genes that are co-regulated with respect to specific experimental conditions. Numerous statistical approaches have been developed to identify differentially expressed genes. Because each approach can produce distinct gene sets, it is difficult for biologists to determine which statistical approach yields biologically relevant gene sets and is appropriate for their study. To address this issue, we implemented Latent Semantic Indexing (LSI) to determine the functional coherence of gene sets. An LSI model was built using over 1 million Medline abstracts for over 20,000 mouse and human genes annotated in Entrez Gene. The gene-to-gene LSI-derived similarities were used to calculate a literature cohesion p-value (LPv) for a given gene set using a Fisher's exact test. We tested this method against genes in more than 6,000 functional pathways annotated in Gene Ontology (GO) and found that approximately 75% of gene sets in GO biological process category and 90% of the gene sets in GO molecular function and cellular component categories were functionally cohesive (LPv<0.05). These results indicate that the LPv methodology is both robust and accurate. Application of this method to previously published microarray datasets demonstrated that LPv can be helpful in selecting the appropriate feature extraction methods. To enable real-time calculation of LPv for mouse or human gene sets, we developed a web tool called Gene-set Cohesion Analysis Tool (GCAT). GCAT can complement other gene set enrichment approaches by determining the overall functional cohesion of data sets, taking into account both explicit and implicit gene interactions reported in the biomedical literature.

AVAILABILITY

GCAT is freely available at http://binf1.memphis.edu/gcat.

摘要

未加标签

高通量基因组技术使研究人员能够识别与特定实验条件相关的共同调节基因。已经开发了许多统计方法来识别差异表达基因。由于每种方法都可以产生不同的基因集,因此生物学家很难确定哪种统计方法产生了生物学上相关的基因集,并适合他们的研究。为了解决这个问题,我们实施了潜在语义索引(LSI)来确定基因集的功能一致性。使用超过 100 万篇 Medline 摘要和 Entrez Gene 中注释的超过 20000 个小鼠和人类基因构建了 LSI 模型。使用 LSI 衍生的基因间相似性,使用 Fisher 精确检验计算给定基因集的文献凝聚 p 值(LPv)。我们使用基因本体论(GO)中注释的 6000 多个功能途径中的基因对这种方法进行了测试,发现 GO 生物过程类别中的约 75%的基因集和 GO 分子功能和细胞成分类别的 90%的基因集具有功能一致性(LPv<0.05)。这些结果表明 LPv 方法既稳健又准确。将该方法应用于先前发表的微阵列数据集表明,LPv 有助于选择合适的特征提取方法。为了能够实时计算小鼠或人类基因集的 LPv,我们开发了一个名为基因集凝聚分析工具(GCAT)的网络工具。GCAT 可以通过确定数据集的整体功能凝聚来补充其他基因集富集方法,同时考虑生物医学文献中报告的显式和隐式基因相互作用。

可用性

GCAT 可在 http://binf1.memphis.edu/gcat 免费获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1fea/3077411/15585237b2b7/pone.0018851.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验