利用基因本体论中的语义相似性和加权方案对来自基因组计划的匿名序列进行快速注释。

Rapid annotation of anonymous sequences from genome projects using semantic similarities and a weighting scheme in gene ontology.

作者信息

Fontana Paolo, Cestaro Alessandro, Velasco Riccardo, Formentin Elide, Toppo Stefano

机构信息

FEM-IASMA Research Center, San Michele all'Adige (TN), Italy.

出版信息

PLoS One. 2009;4(2):e4619. doi: 10.1371/journal.pone.0004619. Epub 2009 Feb 27.

DOI:10.1371/journal.pone.0004619

PMID:19247487

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2645684/

Abstract

BACKGROUND

Large-scale sequencing projects have now become routine lab practice and this has led to the development of a new generation of tools involving function prediction methods, bringing the latter back to the fore. The advent of Gene Ontology, with its structured vocabulary and paradigm, has provided computational biologists with an appropriate means for this task.

METHODOLOGY

We present here a novel method called ARGOT (Annotation Retrieval of Gene Ontology Terms) that is able to process quickly thousands of sequences for functional inference. The tool exploits for the first time an integrated approach which combines clustering of GO terms, based on their semantic similarities, with a weighting scheme which assesses retrieved hits sharing a certain number of biological features with the sequence to be annotated. These hits may be obtained by different methods and in this work we have based ARGOT processing on BLAST results.

CONCLUSIONS

The extensive benchmark involved 10,000 protein sequences, the complete S. cerevisiae genome and a small subset of proteins for purposes of comparison with other available tools. The algorithm was proven to outperform existing methods and to be suitable for function prediction of single proteins due to its high degree of sensitivity, specificity and coverage.

摘要

背景

大规模测序项目如今已成为常规实验室操作，这促使了新一代工具的开发，其中包括功能预测方法，使得后者再次受到关注。基因本体论（Gene Ontology）及其结构化词汇和范式的出现，为计算生物学家完成这项任务提供了合适的手段。

方法

我们在此介绍一种名为ARGOT（基因本体术语注释检索）的新方法，它能够快速处理数千个序列以进行功能推断。该工具首次采用了一种综合方法，该方法将基于语义相似性的基因本体术语聚类与一种加权方案相结合，该加权方案评估与待注释序列共享一定数量生物学特征的检索到的匹配项。这些匹配项可以通过不同方法获得，在本研究中，我们基于BLAST结果进行ARGOT处理。

结论

广泛的基准测试涉及10,000个蛋白质序列、完整的酿酒酵母基因组以及一小部分蛋白质，以便与其他现有工具进行比较。该算法被证明优于现有方法，并且由于其高度的敏感性、特异性和覆盖率，适用于单个蛋白质的功能预测。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9928/2645684/24068f88701d/pone.0004619.g001.jpg

相似文献

Rapid annotation of anonymous sequences from genome projects using semantic similarities and a weighting scheme in gene ontology.利用基因本体论中的语义相似性和加权方案对来自基因组计划的匿名序列进行快速注释。

PLoS One. 2009;4(2):e4619. doi: 10.1371/journal.pone.0004619. Epub 2009 Feb 27.

Argot2: a large scale function prediction tool relying on semantic similarity of weighted Gene Ontology terms.Argot2：一个大规模的功能预测工具，依赖于加权基因本体术语的语义相似性。

BMC Bioinformatics. 2012 Mar 28;13 Suppl 4(Suppl 4):S14. doi: 10.1186/1471-2105-13-S4-S14.

Gene Ontology annotation of the rice blast fungus, Magnaporthe oryzae.稻瘟病菌（Magnaporthe oryzae）的基因本体注释

BMC Microbiol. 2009 Feb 19;9 Suppl 1(Suppl 1):S8. doi: 10.1186/1471-2180-9-S1-S8.

A sensitive method for computing GO-based functional similarities among genes with 'shallow annotation'.一种用于计算具有“浅层注释”的基因之间基于 GO 的功能相似性的敏感方法。

Gene. 2012 Nov 1;509(1):131-5. doi: 10.1016/j.gene.2012.07.078. Epub 2012 Aug 10.

Evaluation of GO-based functional similarity measures using S. cerevisiae protein interaction and expression profile data.利用酿酒酵母蛋白质相互作用和表达谱数据评估基于GO的功能相似性度量

BMC Bioinformatics. 2008 Nov 6;9:472. doi: 10.1186/1471-2105-9-472.

AVID: an integrative framework for discovering functional relationships among proteins.AVID：一个用于发现蛋白质间功能关系的综合框架。

BMC Bioinformatics. 2005 Jun 1;6:136. doi: 10.1186/1471-2105-6-136.

Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO).酵母基因组数据库（SGD）使用基因本体论（GO）提供二级基因注释。

Nucleic Acids Res. 2002 Jan 1;30(1):69-72. doi: 10.1093/nar/30.1.69.

GOtcha: a new method for prediction of protein function assessed by the annotation of seven genomes.GOtcha：一种通过七个基因组注释评估蛋白质功能预测的新方法。

BMC Bioinformatics. 2004 Nov 18;5:178. doi: 10.1186/1471-2105-5-178.

Novel symmetry-based gene-gene dissimilarity measures utilizing Gene Ontology: Application in gene clustering.基于新型对称的基因-基因相异度度量方法，并利用基因本体论：在基因聚类中的应用。

Gene. 2018 Dec 30;679:341-351. doi: 10.1016/j.gene.2018.08.062. Epub 2018 Sep 2.

GO functional similarity clustering depends on similarity measure, clustering method, and annotation completeness.GO 功能相似性聚类取决于相似性度量、聚类方法和注释完整性。

BMC Bioinformatics. 2019 Mar 27;20(1):155. doi: 10.1186/s12859-019-2752-2.

引用本文的文献

Classification and Computational Analysis of Sperm Cell-Specific F-Box Protein Gene .精子细胞特异性F-Box蛋白基因的分类与计算分析

Front Genet. 2020 Dec 14;11:609668. doi: 10.3389/fgene.2020.609668. eCollection 2020.

Clues of in vivo nuclear gene regulation by mitochondrial short non-coding RNAs.线粒体短链非编码RNA对体内核基因调控的线索

Sci Rep. 2020 May 19;10(1):8219. doi: 10.1038/s41598-020-65084-z.

Fast Regulation of Hormone Metabolism Contributes to Salt Tolerance in Rice ( spp. Japonica, L.) by Inducing Specific Morpho-Physiological Responses.激素代谢的快速调节通过诱导特定的形态生理反应促进水稻（粳稻亚种，L.）的耐盐性。

Plants (Basel). 2018 Sep 15;7(3):75. doi: 10.3390/plants7030075.

Quantitative multiplexed proteomics of Taenia solium cysts obtained from the skeletal muscle and central nervous system of pigs.从猪的骨骼肌和中枢神经系统获取的猪带绦虫囊尾蚴的定量多重蛋白质组学研究。

PLoS Negl Trop Dis. 2017 Sep 25;11(9):e0005962. doi: 10.1371/journal.pntd.0005962. eCollection 2017 Sep.

Impacts of the overexpression of a tomato translationally controlled tumor protein (TCTP) in tobacco revealed by phenotypic and transcriptomic analysis.通过表型和转录组分析揭示番茄翻译控制肿瘤蛋白（TCTP）过表达对烟草的影响。

Plant Cell Rep. 2017 Jun;36(6):887-900. doi: 10.1007/s00299-017-2117-0. Epub 2017 Mar 4.

Grouping miRNAs of similar functions via weighted information content of gene ontology.通过基因本体论的加权信息含量对功能相似的微小RNA进行分组。

BMC Bioinformatics. 2016 Dec 22;17(Suppl 19):507. doi: 10.1186/s12859-016-1367-0.

Eliciting the Functional Taxonomy from protein annotations and taxa.从蛋白质注释和分类中提取功能分类法。

Sci Rep. 2016 Aug 18;6:31971. doi: 10.1038/srep31971.

Proteomic Study of Entamoeba histolytica Trophozoites, Cysts, and Cyst-Like Structures.溶组织内阿米巴滋养体、包囊及类包囊结构的蛋白质组学研究

PLoS One. 2016 May 26;11(5):e0156018. doi: 10.1371/journal.pone.0156018. eCollection 2016.

Integrated protein function prediction by mining function associations, sequences, and protein-protein and gene-gene interaction networks.通过挖掘功能关联、序列以及蛋白质-蛋白质和基因-基因相互作用网络进行综合蛋白质功能预测。

Methods. 2016 Jan 15;93:84-91. doi: 10.1016/j.ymeth.2015.09.011. Epub 2015 Sep 11.

INGA: protein function prediction combining interaction networks, domain assignments and sequence similarity.INGA：结合相互作用网络、结构域分配和序列相似性的蛋白质功能预测

Nucleic Acids Res. 2015 Jul 1;43(W1):W134-40. doi: 10.1093/nar/gkv523. Epub 2015 May 27.

本文引用的文献

Transcriptome analysis of Medicago truncatula leaf senescence: similarities and differences in metabolic and transcriptional regulations as compared with Arabidopsis, nodule senescence and nitric oxide signalling.蒺藜苜蓿叶片衰老的转录组分析：与拟南芥、根瘤衰老和一氧化氮信号传导相比，代谢和转录调控方面的异同

New Phytol. 2009;181(3):563-75. doi: 10.1111/j.1469-8137.2008.02684.x. Epub 2008 Nov 17.

Metrics for GO based protein semantic similarity: a systematic evaluation.基于基因本体论（GO）的蛋白质语义相似性度量：系统评估

BMC Bioinformatics. 2008 Apr 29;9 Suppl 5(Suppl 5):S4. doi: 10.1186/1471-2105-9-S5-S4.

Gene Ontology annotations: what they mean and where they come from.基因本体论注释：它们的含义及来源

BMC Bioinformatics. 2008 Apr 29;9 Suppl 5(Suppl 5):S2. doi: 10.1186/1471-2105-9-S5-S2.

Monitoring the evolutionary aspect of the Gene Ontology to enhance predictability and usability.监测基因本体论的进化方面以提高可预测性和可用性。

BMC Bioinformatics. 2008 Apr 11;9 Suppl 3(Suppl 3):S7. doi: 10.1186/1471-2105-9-S3-S7.

Gene Ontology annotation quality analysis in model eukaryotes.模式真核生物中的基因本体注释质量分析

Nucleic Acids Res. 2008 Feb;36(2):e12. doi: 10.1093/nar/gkm1167. Epub 2008 Jan 10.

A high quality draft consensus sequence of the genome of a heterozygous grapevine variety.一个杂合葡萄品种基因组的高质量初步共识序列。

PLoS One. 2007 Dec 19;2(12):e1326. doi: 10.1371/journal.pone.0001326.

Predicting protein function from sequence and structure.从序列和结构预测蛋白质功能。

Nat Rev Mol Cell Biol. 2007 Dec;8(12):995-1005. doi: 10.1038/nrm2281.

InterPro and InterProScan: tools for protein sequence classification and comparison.InterPro和InterProScan：用于蛋白质序列分类和比较的工具。

Methods Mol Biol. 2007;396:59-70. doi: 10.1007/978-1-59745-515-2_5.

The Gene Ontology project in 2008.2008年的基因本体论项目。

Nucleic Acids Res. 2008 Jan;36(Database issue):D440-4. doi: 10.1093/nar/gkm883. Epub 2007 Nov 4.

Gene Ontology annotations at SGD: new data sources and annotation methods.SGD 中的基因本体注释：新数据源与注释方法

Nucleic Acids Res. 2008 Jan;36(Database issue):D577-81. doi: 10.1093/nar/gkm909. Epub 2007 Nov 3.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用基因本体论中的语义相似性和加权方案对来自基因组计划的匿名序列进行快速注释。

Rapid annotation of anonymous sequences from genome projects using semantic similarities and a weighting scheme in gene ontology.

作者信息

机构信息

出版信息

BACKGROUND

METHODOLOGY

CONCLUSIONS

背景

方法

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献