基于 PubMed 摘要潜在语义索引的基因集功能内聚性。

Functional cohesion of gene sets determined by latent semantic indexing of PubMed abstracts.

机构信息

Bioinformatics Program, University of Memphis, Memphis, Tennessee, United States of America.

出版信息

PLoS One. 2011 Apr 14;6(4):e18851. doi: 10.1371/journal.pone.0018851.

DOI:10.1371/journal.pone.0018851

PMID:21533142

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3077411/

Abstract

UNLABELLED

High-throughput genomic technologies enable researchers to identify genes that are co-regulated with respect to specific experimental conditions. Numerous statistical approaches have been developed to identify differentially expressed genes. Because each approach can produce distinct gene sets, it is difficult for biologists to determine which statistical approach yields biologically relevant gene sets and is appropriate for their study. To address this issue, we implemented Latent Semantic Indexing (LSI) to determine the functional coherence of gene sets. An LSI model was built using over 1 million Medline abstracts for over 20,000 mouse and human genes annotated in Entrez Gene. The gene-to-gene LSI-derived similarities were used to calculate a literature cohesion p-value (LPv) for a given gene set using a Fisher's exact test. We tested this method against genes in more than 6,000 functional pathways annotated in Gene Ontology (GO) and found that approximately 75% of gene sets in GO biological process category and 90% of the gene sets in GO molecular function and cellular component categories were functionally cohesive (LPv<0.05). These results indicate that the LPv methodology is both robust and accurate. Application of this method to previously published microarray datasets demonstrated that LPv can be helpful in selecting the appropriate feature extraction methods. To enable real-time calculation of LPv for mouse or human gene sets, we developed a web tool called Gene-set Cohesion Analysis Tool (GCAT). GCAT can complement other gene set enrichment approaches by determining the overall functional cohesion of data sets, taking into account both explicit and implicit gene interactions reported in the biomedical literature.

AVAILABILITY

GCAT is freely available at http://binf1.memphis.edu/gcat.

摘要

未加标签

高通量基因组技术使研究人员能够识别与特定实验条件相关的共同调节基因。已经开发了许多统计方法来识别差异表达基因。由于每种方法都可以产生不同的基因集，因此生物学家很难确定哪种统计方法产生了生物学上相关的基因集，并适合他们的研究。为了解决这个问题，我们实施了潜在语义索引（LSI）来确定基因集的功能一致性。使用超过 100 万篇 Medline 摘要和 Entrez Gene 中注释的超过 20000 个小鼠和人类基因构建了 LSI 模型。使用 LSI 衍生的基因间相似性，使用 Fisher 精确检验计算给定基因集的文献凝聚 p 值（LPv）。我们使用基因本体论（GO）中注释的 6000 多个功能途径中的基因对这种方法进行了测试，发现 GO 生物过程类别中的约 75%的基因集和 GO 分子功能和细胞成分类别的 90%的基因集具有功能一致性（LPv<0.05）。这些结果表明 LPv 方法既稳健又准确。将该方法应用于先前发表的微阵列数据集表明，LPv 有助于选择合适的特征提取方法。为了能够实时计算小鼠或人类基因集的 LPv，我们开发了一个名为基因集凝聚分析工具（GCAT）的网络工具。GCAT 可以通过确定数据集的整体功能凝聚来补充其他基因集富集方法，同时考虑生物医学文献中报告的显式和隐式基因相互作用。

可用性

GCAT 可在 http://binf1.memphis.edu/gcat 免费获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1fea/3077411/15585237b2b7/pone.0018851.g001.jpg

相似文献

Functional cohesion of gene sets determined by latent semantic indexing of PubMed abstracts.基于 PubMed 摘要潜在语义索引的基因集功能内聚性。

PLoS One. 2011 Apr 14;6(4):e18851. doi: 10.1371/journal.pone.0018851.

Latent Semantic Indexing of PubMed abstracts for identification of transcription factor candidates from microarray derived gene sets.基于PubMed 摘要的潜在语义索引从微阵列基因集中识别转录因子候选物。

BMC Bioinformatics. 2011 Oct 18;12 Suppl 10(Suppl 10):S19. doi: 10.1186/1471-2105-12-S10-S19.

Evaluation of Sirtuin-3 probe quality and co-expressed genes using literature cohesion.使用文献内聚度评价 Sirtuin-3 探针质量和共表达基因。

BMC Bioinformatics. 2019 Mar 14;20(Suppl 2):104. doi: 10.1186/s12859-019-2621-z.

Literature aided determination of data quality and statistical significance threshold for gene expression studies.文献辅助确定基因表达研究的数据质量和统计显著性阈值。

BMC Genomics. 2012;13 Suppl 8(Suppl 8):S23. doi: 10.1186/1471-2164-13-S8-S23. Epub 2012 Dec 17.

Gene clustering by latent semantic indexing of MEDLINE abstracts.通过MEDLINE摘要的潜在语义索引进行基因聚类。

Bioinformatics. 2005 Jan 1;21(1):104-15. doi: 10.1093/bioinformatics/bth464. Epub 2004 Aug 12.

Prioritization, clustering and functional annotation of MicroRNAs using latent semantic indexing of MEDLINE abstracts.使用医学在线（MEDLINE）摘要的潜在语义索引对微小RNA进行优先级排序、聚类和功能注释。

BMC Bioinformatics. 2016 Oct 6;17(Suppl 13):350. doi: 10.1186/s12859-016-1223-2.

Ontology-based Brucella vaccine literature indexing and systematic analysis of gene-vaccine association network.基于本体论的布鲁氏菌疫苗文献标引及基因-疫苗关联网络的系统分析。

BMC Immunol. 2011 Aug 26;12:49. doi: 10.1186/1471-2172-12-49.

Navigating the Functional Landscape of Transcription Factors via Non-Negative Tensor Factorization Analysis of MEDLINE Abstracts.通过对MEDLINE摘要进行非负张量分解分析来探索转录因子的功能格局

Front Bioeng Biotechnol. 2017 Aug 28;5:48. doi: 10.3389/fbioe.2017.00048. eCollection 2017.

Automatic summarization of mouse gene information by clustering and sentence extraction from MEDLINE abstracts.通过对MEDLINE摘要进行聚类和句子提取来自动汇总小鼠基因信息。

AMIA Annu Symp Proc. 2007 Oct 11;2007:831-5.

Semantically linking and browsing PubMed abstracts with gene ontology.通过基因本体论对PubMed摘要进行语义链接和浏览。

BMC Genomics. 2008;9 Suppl 1(Suppl 1):S10. doi: 10.1186/1471-2164-9-S1-S10.

引用本文的文献

HOGA1 Suppresses Renal Cell Carcinoma Growth via Inhibiting the Wnt/β-Catenin Signalling Pathway.HOGA1通过抑制Wnt/β-连环蛋白信号通路抑制肾细胞癌生长。

J Cell Mol Med. 2025 Mar;29(6):e70490. doi: 10.1111/jcmm.70490.

Systems genetics identifies methionine as a high risk factor for Alzheimer's disease.系统遗传学确定蛋氨酸是阿尔茨海默病的一个高风险因素。

Front Neurosci. 2024 Jul 16;18:1381889. doi: 10.3389/fnins.2024.1381889. eCollection 2024.

Expression Levels of the Gene in the Heart Are Highly Associated with Cardiac and Glucose Metabolism-Related Phenotypes and Functional Pathways.该基因在心脏中的表达水平与心脏和葡萄糖代谢相关表型及功能途径高度相关。

Int J Mol Sci. 2023 Aug 14;24(16):12759. doi: 10.3390/ijms241612759.

A systems genetics approach delineates the role of Bcl2 in leukemia pathogenesis.系统遗传学方法阐明了 Bcl2 在白血病发病机制中的作用。

Leuk Res. 2022 Mar;114:106804. doi: 10.1016/j.leukres.2022.106804. Epub 2022 Feb 9.

The Genetic Dissection of Expression Variation in the Heart of Murine Genetic Reference Population.小鼠遗传参考群体心脏中表达变异的遗传剖析

Front Cardiovasc Med. 2020 Nov 20;7:582949. doi: 10.3389/fcvm.2020.582949. eCollection 2020.

Clinical MetaData ontology: a simple classification scheme for data elements of clinical data based on semantics.临床元数据本体：基于语义的临床数据数据元素的简单分类方案。

BMC Med Inform Decis Mak. 2019 Aug 20;19(1):166. doi: 10.1186/s12911-019-0877-x.

Evaluation of Sirtuin-3 probe quality and co-expressed genes using literature cohesion.使用文献内聚度评价 Sirtuin-3 探针质量和共表达基因。

BMC Bioinformatics. 2019 Mar 14;20(Suppl 2):104. doi: 10.1186/s12859-019-2621-z.

Dissection of Z-disc myopalladin gene network involved in the development of restrictive cardiomyopathy using system genetics approach.运用系统遗传学方法剖析参与限制型心肌病发展的Z盘肌 palladin基因网络。

World J Cardiol. 2017 Apr 26;9(4):320-331. doi: 10.4330/wjc.v9.i4.320.

Gene expression profiles associated with cigarette smoking and moist snuff consumption.与吸烟和鼻烟消费相关的基因表达谱

BMC Genomics. 2017 Feb 14;18(1):156. doi: 10.1186/s12864-017-3565-1.

Functional coherence metrics in protein families.蛋白质家族中的功能一致性指标

J Biomed Semantics. 2016 Jun 23;7(1):41. doi: 10.1186/s13326-016-0076-y.

本文引用的文献

Novel protein-protein interactions inferred from literature context.从文献上下文中推断出的新型蛋白质-蛋白质相互作用。

PLoS One. 2009 Nov 18;4(11):e7894. doi: 10.1371/journal.pone.0007894.

SNOW, a web-based tool for the statistical analysis of protein-protein interaction networks.SNOW，一种用于蛋白质-蛋白质相互作用网络统计分析的基于网络的工具。

Nucleic Acids Res. 2009 Jul;37(Web Server issue):W109-14. doi: 10.1093/nar/gkp402. Epub 2009 May 19.

Comparing algorithms for clustering of expression data: how to assess gene clusters.比较用于表达数据聚类的算法：如何评估基因簇。

Methods Mol Biol. 2009;541:479-509. doi: 10.1007/978-1-59745-243-4_21.

Assessment of protein set coherence using functional annotations.使用功能注释评估蛋白质集的一致性。

BMC Bioinformatics. 2008 Oct 20;9:444. doi: 10.1186/1471-2105-9-444.

Gene set enrichment analysis using linear models and diagnostics.使用线性模型和诊断方法的基因集富集分析。

Bioinformatics. 2008 Nov 15;24(22):2586-91. doi: 10.1093/bioinformatics/btn465. Epub 2008 Sep 11.

Bioinformatic analysis reveals cRel as a regulator of a subset of interferon-stimulated genes.生物信息学分析表明，cRel是干扰素刺激基因子集的一个调节因子。

J Interferon Cytokine Res. 2008 Sep;28(9):541-51. doi: 10.1089/jir.2007.0136.

Using protein-semantic network metrics to evaluate functional coherence of protein groups.使用蛋白质语义网络指标评估蛋白质组的功能连贯性。

AMIA Annu Symp Proc. 2007 Oct 11:1174.

Gene-set approach for expression pattern analysis.用于表达模式分析的基因集方法。

Brief Bioinform. 2008 May;9(3):189-97. doi: 10.1093/bib/bbn001. Epub 2008 Jan 17.

Inferring biological functions and associated transcriptional regulators using gene set expression coherence analysis.使用基因集表达一致性分析推断生物学功能和相关转录调节因子。

BMC Bioinformatics. 2007 Nov 17;8:453. doi: 10.1186/1471-2105-8-453.

Novel metrics for evaluating the functional coherence of protein groups via protein semantic network.通过蛋白质语义网络评估蛋白质组功能连贯性的新指标。

Genome Biol. 2007;8(7):R153. doi: 10.1186/gb-2007-8-7-r153.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于 PubMed 摘要潜在语义索引的基因集功能内聚性。

Functional cohesion of gene sets determined by latent semantic indexing of PubMed abstracts.

机构信息

出版信息

UNLABELLED

AVAILABILITY

未加标签

可用性

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献