• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

总祖先度量:量化树状分类中的相似性及其在基因组学中的应用。

Total ancestry measure: quantifying the similarity in tree-like classification, with genomic applications.

作者信息

Yu Haiyuan, Jansen Ronald, Stolovitzky Gustavo, Gerstein Mark

机构信息

Department of Molecular Biophysics & Biochemistry, Yale University, PO Box 208114, New Haven, CT 06520, USA.

出版信息

Bioinformatics. 2007 Aug 15;23(16):2163-73. doi: 10.1093/bioinformatics/btm291. Epub 2007 May 31.

DOI:10.1093/bioinformatics/btm291
PMID:17540677
Abstract

MOTIVATION

Many classifications of protein function such as Gene Ontology (GO) are organized in directed acyclic graph (DAG) structures. In these classifications, the proteins are terminal leaf nodes; the categories 'above' them are functional annotations at various levels of specialization and the computation of a numerical measure of relatedness between two arbitrary proteins is an important proteomics problem. Moreover, analogous problems are important in other contexts in large-scale information organization--e.g. the Wikipedia online encyclopedia and the Yahoo and DMOZ web page classification schemes.

RESULTS

Here we develop a simple probabilistic approach for computing this relatedness quantity, which we call the total ancestry method. Our measure is based on counting the number of leaf nodes that share exactly the same set of 'higher up' category nodes in comparison to the total number of classified pairs (i.e. the chance for the same total ancestry). We show such a measure is associated with a power-law distribution, allowing for the quick assessment of the statistical significance of shared functional annotations. We formally compare it with other quantitative functional similarity measures (such as, shortest path within a DAG, lowest common ancestor shared and Azuaje's information-theoretic similarity) and provide concrete metrics to assess differences. Finally, we provide a practical implementation for our total ancestry measure for GO and the MIPS functional catalog and give two applications of it in specific functional genomics contexts.

AVAILABILITY

The implementations and results are available through our supplementary website at: http://gersteinlab.org/proj/funcsim.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

许多蛋白质功能分类,如基因本体论(GO),都是以有向无环图(DAG)结构组织的。在这些分类中,蛋白质是终端叶节点;在它们“之上”的类别是不同专业化水平的功能注释,计算任意两个蛋白质之间相关性的数值度量是一个重要的蛋白质组学问题。此外,类似的问题在大规模信息组织的其他背景下也很重要,例如维基百科在线百科全书以及雅虎和DMOZ网页分类方案。

结果

在这里,我们开发了一种简单的概率方法来计算这种相关性数量,我们称之为总祖先方法。我们的度量基于计算与分类对总数相比,共享完全相同的“更高层次”类别节点集的叶节点数量(即相同总祖先的概率)。我们表明这种度量与幂律分布相关联,允许快速评估共享功能注释的统计显著性。我们将其与其他定量功能相似性度量(如DAG内的最短路径、共享的最低共同祖先和阿苏阿耶的信息论相似性)进行了正式比较,并提供了评估差异的具体指标。最后,我们为GO和MIPS功能目录的总祖先度量提供了一个实际实现,并给出了它在特定功能基因组学背景下的两个应用。

可用性

实现和结果可通过我们的补充网站获取:http://gersteinlab.org/proj/funcsim。

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

1
Total ancestry measure: quantifying the similarity in tree-like classification, with genomic applications.总祖先度量:量化树状分类中的相似性及其在基因组学中的应用。
Bioinformatics. 2007 Aug 15;23(16):2163-73. doi: 10.1093/bioinformatics/btm291. Epub 2007 May 31.
2
Graph sharpening plus graph integration: a synergy that improves protein functional classification.图谱锐化加图谱整合:一种改善蛋白质功能分类的协同作用。
Bioinformatics. 2007 Dec 1;23(23):3217-24. doi: 10.1093/bioinformatics/btm511. Epub 2007 Oct 31.
3
Assessment of phylogenomic and orthology approaches for phylogenetic inference.用于系统发育推断的系统发育基因组学和直系同源方法评估。
Bioinformatics. 2007 Apr 1;23(7):815-24. doi: 10.1093/bioinformatics/btm015. Epub 2007 Jan 19.
4
Detection of eQTL modules mediated by activity levels of transcription factors.检测由转录因子活性水平介导的eQTL模块。
Bioinformatics. 2007 Sep 1;23(17):2290-7. doi: 10.1093/bioinformatics/btm327. Epub 2007 Jun 28.
5
A simulation test bed for hypotheses of genome evolution.用于基因组进化假说的模拟试验台。
Bioinformatics. 2007 Apr 1;23(7):825-31. doi: 10.1093/bioinformatics/btm024. Epub 2007 Jan 31.
6
Protein structure alignment considering phenotypic plasticity.考虑表型可塑性的蛋白质结构比对
Bioinformatics. 2008 Aug 15;24(16):i98-104. doi: 10.1093/bioinformatics/btn271.
7
PIP: a database of potential intron polymorphism markers.PIP:一个潜在内含子多态性标记的数据库。
Bioinformatics. 2007 Aug 15;23(16):2174-7. doi: 10.1093/bioinformatics/btm296. Epub 2007 Jun 1.
8
The use of gene ontology evidence codes in preventing classifier assessment bias.基因本体证据代码在防止分类器评估偏差中的应用。
Bioinformatics. 2009 May 1;25(9):1173-7. doi: 10.1093/bioinformatics/btp122. Epub 2009 Mar 2.
9
Reticulate representation of evolutionary and functional relationships between phage genomes.噬菌体基因组之间进化和功能关系的网状表示。
Mol Biol Evol. 2008 Apr;25(4):762-77. doi: 10.1093/molbev/msn023. Epub 2008 Jan 29.
10
Tracing evolutionary pressure.追溯进化压力
Bioinformatics. 2008 Apr 1;24(7):908-15. doi: 10.1093/bioinformatics/btn057. Epub 2008 Feb 26.

引用本文的文献

1
HESML: a real-time semantic measures library for the biomedical domain with a reproducible survey.HESML:生物医学领域的实时语义度量库,附有可重现的调查。
BMC Bioinformatics. 2022 Jan 6;23(1):23. doi: 10.1186/s12859-021-04539-0.
2
Integrated network analysis reveals distinct regulatory roles of transcription factors and microRNAs.综合网络分析揭示了转录因子和微小RNA的不同调控作用。
RNA. 2016 Nov;22(11):1663-1672. doi: 10.1261/rna.048025.114. Epub 2016 Sep 7.
3
Measuring semantic similarities by combining gene ontology annotations and gene co-function networks.
通过结合基因本体注释和基因共功能网络来测量语义相似性。
BMC Bioinformatics. 2015 Feb 14;16:44. doi: 10.1186/s12859-015-0474-7.
4
An integrative approach for measuring semantic similarities using gene ontology.一种使用基因本体来测量语义相似性的综合方法。
BMC Syst Biol. 2014;8 Suppl 5(Suppl 5):S8. doi: 10.1186/1752-0509-8-S5-S8. Epub 2014 Dec 12.
5
OrthoClust: an orthology-based network framework for clustering data across multiple species.OrthoClust:一种基于直系同源关系的网络框架,用于跨多个物种对数据进行聚类。
Genome Biol. 2014 Aug 28;15(8):R100. doi: 10.1186/gb-2014-15-8-r100.
6
Towards integrative gene functional similarity measurement.迈向综合的基因功能相似性度量。
BMC Bioinformatics. 2014;15 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2105-15-S2-S5. Epub 2014 Jan 24.
7
CoCiter: an efficient tool to infer gene function by assessing the significance of literature co-citation.CoCiter:一种通过评估文献共引的显著性来推断基因功能的有效工具。
PLoS One. 2013 Sep 23;8(9):e74074. doi: 10.1371/journal.pone.0074074. eCollection 2013.
8
Dissecting disease inheritance modes in a three-dimensional protein network challenges the "guilt-by-association" principle.在三维蛋白质网络中剖析疾病遗传模式,挑战了“关联即有罪”的原则。
Am J Hum Genet. 2013 Jul 11;93(1):78-89. doi: 10.1016/j.ajhg.2013.05.022. Epub 2013 Jun 20.
9
Genome-scale analysis of interaction dynamics reveals organization of biological networks.基于基因组规模的相互作用动力学分析揭示了生物网络的组织方式。
Bioinformatics. 2012 Jul 15;28(14):1873-8. doi: 10.1093/bioinformatics/bts283. Epub 2012 May 9.
10
Systems analysis of inflammatory bowel disease based on comprehensive gene information.基于综合基因信息的炎症性肠病系统分析。
BMC Med Genet. 2012 Apr 5;13:25. doi: 10.1186/1471-2350-13-25.