• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基因本体论的语言:Zipf 定律分析。

The language of gene ontology: a Zipf's law analysis.

机构信息

School of Computer Science, University of Manchester, Oxford Road, Manchester M13 9PL, UK.

出版信息

BMC Bioinformatics. 2012 Jun 7;13:127. doi: 10.1186/1471-2105-13-127.

DOI:10.1186/1471-2105-13-127
PMID:22676436
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3473240/
Abstract

BACKGROUND

Most major genome projects and sequence databases provide a GO annotation of their data, either automatically or through human annotators, creating a large corpus of data written in the language of GO. Texts written in natural language show a statistical power law behaviour, Zipf's law, the exponent of which can provide useful information on the nature of the language being used. We have therefore explored the hypothesis that collections of GO annotations will show similar statistical behaviours to natural language.

RESULTS

Annotations from the Gene Ontology Annotation project were found to follow Zipf's law. Surprisingly, the measured power law exponents were consistently different between annotation captured using the three GO sub-ontologies in the corpora (function, process and component). On filtering the corpora using GO evidence codes we found that the value of the measured power law exponent responded in a predictable way as a function of the evidence codes used to support the annotation.

CONCLUSIONS

Techniques from computational linguistics can provide new insights into the annotation process. GO annotations show similar statistical behaviours to those seen in natural language with measured exponents that provide a signal which correlates with the nature of the evidence codes used to support the annotations, suggesting that the measured exponent might provide a signal regarding the information content of the annotation.

摘要

背景

大多数主要的基因组计划和序列数据库都提供了他们的数据的 GO 注释,无论是自动的还是通过人工注释者提供的,从而创建了大量用 GO 语言编写的数据。用自然语言编写的文本表现出统计幂律行为,即齐夫定律,其指数可以提供有关所使用语言性质的有用信息。因此,我们探讨了这样一个假设,即 GO 注释集将表现出与自然语言相似的统计行为。

结果

发现基因本体论注释项目的注释符合齐夫定律。令人惊讶的是,在语料库中使用三个 GO 子本体(功能、过程和组件)捕获的注释中,测量的幂律指数始终存在差异。通过使用 GO 证据代码过滤语料库,我们发现测量的幂律指数值会根据用于支持注释的证据代码以可预测的方式响应。

结论

计算语言学技术可以为注释过程提供新的见解。GO 注释表现出与自然语言相似的统计行为,其测量指数提供了一个与用于支持注释的证据代码性质相关的信号,这表明测量指数可能提供了一个关于注释信息量的信号。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e24/3473240/f42e0acf76c9/1471-2105-13-127-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e24/3473240/1f4695d81b35/1471-2105-13-127-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e24/3473240/f42e0acf76c9/1471-2105-13-127-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e24/3473240/1f4695d81b35/1471-2105-13-127-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e24/3473240/f42e0acf76c9/1471-2105-13-127-2.jpg

相似文献

1
The language of gene ontology: a Zipf's law analysis.基因本体论的语言:Zipf 定律分析。
BMC Bioinformatics. 2012 Jun 7;13:127. doi: 10.1186/1471-2105-13-127.
2
The languages of health in general practice electronic patient records: a Zipf's law analysis.全科医疗电子病历中的健康语言:齐普夫定律分析
J Biomed Semantics. 2014 Jan 10;5(1):2. doi: 10.1186/2041-1480-5-2.
3
The evolution of the exponent of Zipf's law in language ontogeny.语言个体发生中齐夫定律指数的演变。
PLoS One. 2013;8(3):e53227. doi: 10.1371/journal.pone.0053227. Epub 2013 Mar 13.
4
Zipf's Law for Word Frequencies: Word Forms versus Lemmas in Long Texts.词频的齐普夫定律:长文本中的词形与词元
PLoS One. 2015 Jul 9;10(7):e0129031. doi: 10.1371/journal.pone.0129031. eCollection 2015.
5
Zipf's law leads to Heaps' law: analyzing their relation in finite-size systems.齐夫定律导致海普斯定律:分析有限系统中的它们之间的关系。
PLoS One. 2010 Dec 2;5(12):e14139. doi: 10.1371/journal.pone.0014139.
6
Large-Scale Analysis of Zipf's Law in English Texts.英文文本中齐普夫定律的大规模分析。
PLoS One. 2016 Jan 22;11(1):e0147073. doi: 10.1371/journal.pone.0147073. eCollection 2016.
7
Zipf's law revisited: Spoken dialog, linguistic units, parameters, and the principle of least effort.再探齐夫定律:口语对话、语言单位、参数和省力原则。
Psychon Bull Rev. 2023 Feb;30(1):77-101. doi: 10.3758/s13423-022-02142-9. Epub 2022 Jul 15.
8
Zipf's word frequency law in natural language: a critical review and future directions.自然语言中的齐普夫词频定律:批判性综述与未来方向
Psychon Bull Rev. 2014 Oct;21(5):1112-30. doi: 10.3758/s13423-014-0585-6.
9
DFLAT: functional annotation for human development.DFLAT:人类发育的功能注释。
BMC Bioinformatics. 2014 Feb 7;15:45. doi: 10.1186/1471-2105-15-45.
10
Can Zipf's law be adapted to normalize microarrays?齐普夫定律能否用于对微阵列进行标准化?
BMC Bioinformatics. 2005 Feb 23;6:37. doi: 10.1186/1471-2105-6-37.

引用本文的文献

1
RTX-KG2: a system for building a semantically standardized knowledge graph for translational biomedicine.RTX-KG2:一个用于构建转化生物医学语义标准化知识图谱的系统。
BMC Bioinformatics. 2022 Sep 29;23(1):400. doi: 10.1186/s12859-022-04932-3.
2
Structural Patterns under X-Rays: Is SNOMED CT Growing Straight?X射线之下的结构模式:SNOMED CT是否在稳步发展?
PLoS One. 2016 Nov 3;11(11):e0165619. doi: 10.1371/journal.pone.0165619. eCollection 2016.
3
A common construction pattern of English words and Chinese characters.英语单词和汉字的常见构造模式。

本文引用的文献

1
Zipf's Law and Avoidance of Excessive Synonymy.齐夫定律与避免过度同义词。
Cogn Sci. 2008 Oct;32(7):1075-98. doi: 10.1080/03640210802020003.
2
Word lengths are optimized for efficient communication.词汇长度经过优化,以实现高效沟通。
Proc Natl Acad Sci U S A. 2011 Mar 1;108(9):3526-9. doi: 10.1073/pnas.1012551108. Epub 2011 Jan 28.
3
Annotation confidence score for genome annotation: a genome comparison approach.基因组注释置信度评分:一种基因组比较方法。
PLoS One. 2013 Sep 2;8(9):e74515. doi: 10.1371/journal.pone.0074515. eCollection 2013.
4
Word decoding of protein amino Acid sequences with availability analysis: a linguistic approach.蛋白质氨基酸序列的词法解码与可用性分析:一种语言学法。
PLoS One. 2012;7(11):e50039. doi: 10.1371/journal.pone.0050039. Epub 2012 Nov 21.
5
An approach to describing and analysing bulk biological annotation quality: a case study using UniProtKB.一种描述和分析批量生物注释质量的方法:以 UniProtKB 为例的研究。
Bioinformatics. 2012 Sep 15;28(18):i562-i568. doi: 10.1093/bioinformatics/bts372.
Bioinformatics. 2010 Jan 1;26(1):22-9. doi: 10.1093/bioinformatics/btp613. Epub 2009 Oct 24.
4
Forty years of SNOMED: a literature review.医学系统命名法四十年:文献综述
BMC Med Inform Decis Mak. 2008 Oct 27;8 Suppl 1(Suppl 1):S2. doi: 10.1186/1472-6947-8-S1-S2.
5
Use and misuse of the gene ontology annotations.基因本体注释的使用与误用。
Nat Rev Genet. 2008 Jul;9(7):509-15. doi: 10.1038/nrg2363. Epub 2008 May 13.
6
Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments.使用 EVidenceModeler 和 Program to Assemble Spliced Alignments 进行自动化真核基因结构注释。
Genome Biol. 2008 Jan 11;9(1):R7. doi: 10.1186/gb-2008-9-1-r7.
7
Gene Ontology annotation quality analysis in model eukaryotes.模式真核生物中的基因本体注释质量分析
Nucleic Acids Res. 2008 Feb;36(2):e12. doi: 10.1093/nar/gkm1167. Epub 2008 Jan 10.
8
When language breaks into pieces. A conflict between communication through isolated signals and language.当语言支离破碎。通过孤立信号进行的交流与语言之间的冲突。
Biosystems. 2006 Jun;84(3):242-53. doi: 10.1016/j.biosystems.2005.12.001. Epub 2006 Jan 6.
9
A procedure for assessing GO annotation consistency.一种评估基因本体(GO)注释一致性的程序。
Bioinformatics. 2005 Jun;21 Suppl 1:i136-43. doi: 10.1093/bioinformatics/bti1019.
10
The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology.基因本体注释(GOA)数据库:在UniProt中与基因本体共享知识。
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D262-6. doi: 10.1093/nar/gkh021.