通过基于证据整合的通用知识发现方法对蛋白质功能进行全基因组预测。

Genome wide prediction of protein function via a generic knowledge discovery approach based on evidence integration.

作者信息

Xiong Jianghui, Rayner Simon, Luo Kunyi, Li Yinghui, Chen Shanguang

机构信息

Laboratory of Space Cell and Molecular Biology, China Astronaut Research and Training Center, Beijing, PR, China.

出版信息

BMC Bioinformatics. 2006 May 25;7:268. doi: 10.1186/1471-2105-7-268.

DOI:10.1186/1471-2105-7-268

PMID:16725034

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1481625/

Abstract

BACKGROUND

The automation of many common molecular biology techniques has resulted in the accumulation of vast quantities of experimental data. One of the major challenges now facing researchers is how to process this data to yield useful information about a biological system (e.g. knowledge of genes and their products, and the biological roles of proteins, their molecular functions, localizations and interaction networks). We present a technique called Global Mapping of Unknown Proteins (GMUP) which uses the Gene Ontology Index to relate diverse sources of experimental data by creation of an abstraction layer of evidence data. This abstraction layer is used as input to a neural network which, once trained, can be used to predict function from the evidence data of unannotated proteins. The method allows us to include almost any experimental data set related to protein function, which incorporates the Gene Ontology, to our evidence data in order to seek relationships between the different sets.

RESULTS

We have demonstrated the capabilities of this method in two ways. We first collected various experimental datasets associated with yeast (Saccharomyces cerevisiae) and applied the technique to a set of previously annotated open reading frames (ORFs). These ORFs were divided into training and test sets and were used to examine the accuracy of the predictions made by our method. Then we applied GMUP to previously un-annotated ORFs and made 1980, 836 and 1969 predictions corresponding to the GO Biological Process, Molecular Function and Cellular Component sub-categories respectively. We found that GMUP was particularly successful at predicting ORFs with functions associated with the ribonucleoprotein complex, protein metabolism and transportation.

CONCLUSION

This study presents a global and generic gene knowledge discovery approach based on evidence integration of various genome-scale data. It can be used to provide insight as to how certain biological processes are implemented by interaction and coordination of proteins, which may serve as a guide for future analysis. New data can be readily incorporated as it becomes available to provide more reliable predictions or further insights into processes and interactions.

摘要

背景

许多常见分子生物学技术的自动化导致了大量实验数据的积累。研究人员目前面临的主要挑战之一是如何处理这些数据，以获得有关生物系统的有用信息（例如基因及其产物的知识，以及蛋白质的生物学作用、分子功能、定位和相互作用网络）。我们提出了一种称为未知蛋白质全局映射（GMUP）的技术，该技术使用基因本体索引通过创建证据数据的抽象层来关联不同来源的实验数据。这个抽象层用作神经网络的输入，一旦经过训练，就可以用于从未注释蛋白质的证据数据中预测功能。该方法使我们能够将几乎任何与蛋白质功能相关的实验数据集（其中包含基因本体）纳入我们的证据数据中，以寻找不同数据集之间的关系。

结果

我们通过两种方式展示了该方法的能力。我们首先收集了与酵母（酿酒酵母）相关的各种实验数据集，并将该技术应用于一组先前注释的开放阅读框（ORF）。这些ORF被分为训练集和测试集，并用于检验我们方法所做预测的准确性。然后我们将GMUP应用于先前未注释的ORF，并分别对基因本体生物学过程、分子功能和细胞组分组进行了1980、836和1969次预测。我们发现GMUP在预测与核糖核蛋白复合物、蛋白质代谢和运输相关功能的ORF方面特别成功。

结论

本研究提出了一种基于各种基因组规模数据证据整合的全局通用基因知识发现方法。它可用于深入了解某些生物过程是如何通过蛋白质的相互作用和协调来实现的，这可能为未来的分析提供指导。新数据可用时可以很容易地纳入，以提供更可靠的预测或对过程和相互作用的进一步深入了解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a5d/1481625/10e2b31cde7f/1471-2105-7-268-1.jpg

相似文献

Genome wide prediction of protein function via a generic knowledge discovery approach based on evidence integration.通过基于证据整合的通用知识发现方法对蛋白质功能进行全基因组预测。

BMC Bioinformatics. 2006 May 25;7:268. doi: 10.1186/1471-2105-7-268.

Learning yeast gene functions from heterogeneous sources of data using hybrid weighted Bayesian networks.使用混合加权贝叶斯网络从异构数据源学习酵母基因功能。

Proc IEEE Comput Syst Bioinform Conf. 2005:25-34. doi: 10.1109/csb.2005.38.

A new measure for functional similarity of gene products based on Gene Ontology.一种基于基因本体论的基因产物功能相似性新度量方法。

BMC Bioinformatics. 2006 Jun 15;7:302. doi: 10.1186/1471-2105-7-302.

Annotating proteins by mining protein interaction networks.通过挖掘蛋白质相互作用网络对蛋白质进行注释。

Bioinformatics. 2006 Jul 15;22(14):e260-70. doi: 10.1093/bioinformatics/btl221.

Integrated analysis of gene expression by Association Rules Discovery.通过关联规则发现进行基因表达的综合分析。

BMC Bioinformatics. 2006 Feb 7;7:54. doi: 10.1186/1471-2105-7-54.

Joint learning of gene functions--a Bayesian network model approach.基因功能的联合学习——一种贝叶斯网络模型方法。

J Bioinform Comput Biol. 2006 Apr;4(2):217-39. doi: 10.1142/s0219720006001928.

Predicting gene function in Saccharomyces cerevisiae.预测酿酒酵母中的基因功能。

Bioinformatics. 2003 Oct;19 Suppl 2:ii42-9. doi: 10.1093/bioinformatics/btg1058.

Semantic integration to identify overlapping functional modules in protein interaction networks.用于识别蛋白质相互作用网络中重叠功能模块的语义整合

BMC Bioinformatics. 2007 Jul 24;8:265. doi: 10.1186/1471-2105-8-265.

Combining evidence, biomedical literature and statistical dependence: new insights for functional annotation of gene sets.整合证据、生物医学文献与统计相关性：基因集功能注释的新见解

BMC Bioinformatics. 2006 May 4;7:241. doi: 10.1186/1471-2105-7-241.

A statistical framework for genomic data fusion.基因组数据融合的统计框架。

Bioinformatics. 2004 Nov 1;20(16):2626-35. doi: 10.1093/bioinformatics/bth294. Epub 2004 May 6.

引用本文的文献

Predicting Protein Functions Based on Differential Co-expression and Neighborhood Analysis.基于差异共表达和邻域分析预测蛋白质功能。

J Comput Biol. 2021 Jan;28(1):1-18. doi: 10.1089/cmb.2019.0120. Epub 2020 Apr 17.

ISOGO: Functional annotation of protein-coding splice variants.ISOGO：蛋白质编码剪接变体的功能注释。

Sci Rep. 2020 Jan 23;10(1):1069. doi: 10.1038/s41598-020-57974-z.

Evolutionary history and genetic diversity study of heat-shock protein 60 of .……的热休克蛋白60的进化史与遗传多样性研究

J Genet. 2019 Jun;98(2).

Hierarchical ensemble methods for protein function prediction.用于蛋白质功能预测的分层集成方法。

ISRN Bioinform. 2014 May 4;2014:901419. doi: 10.1155/2014/901419. eCollection 2014.

Using biological networks to improve our understanding of infectious diseases.利用生物网络提高我们对传染病的认识。

Comput Struct Biotechnol J. 2014 Aug 27;11(18):1-10. doi: 10.1016/j.csbj.2014.08.006. eCollection 2014 Aug.

Gene function hypotheses for the Campylobacter jejuni glycome generated by a logic-based approach.基于逻辑的方法生成的空肠弯曲菌糖组的基因功能假设。

J Mol Biol. 2013 Jan 9;425(1):186-97. doi: 10.1016/j.jmb.2012.10.014. Epub 2012 Oct 24.

Scoring protein relationships in functional interaction networks predicted from sequence data.从序列数据预测的功能相互作用网络中评分蛋白质关系。

PLoS One. 2011 Apr 19;6(4):e18607. doi: 10.1371/journal.pone.0018607.

Amino acid metabolic origin as an evolutionary influence on protein sequence in yeast.氨基酸代谢起源作为对酵母蛋白质序列的一种进化影响。

J Mol Evol. 2009 May;68(5):490-7. doi: 10.1007/s00239-009-9218-5. Epub 2009 Apr 9.

High-precision high-coverage functional inference from integrated data sources.从综合数据源进行高精度高覆盖度的功能推断。

BMC Bioinformatics. 2008 Feb 25;9:119. doi: 10.1186/1471-2105-9-119.

Exploring inconsistencies in genome-wide protein function annotations: a machine learning approach.探索全基因组蛋白质功能注释中的不一致性：一种机器学习方法。

BMC Bioinformatics. 2007 Aug 3;8:284. doi: 10.1186/1471-2105-8-284.

本文引用的文献

Training feedforward networks with the Marquardt algorithm.使用马夸特算法训练前馈网络。

IEEE Trans Neural Netw. 1994;5(6):989-93. doi: 10.1109/72.329697.

Global protein function annotation through mining genome-scale data in yeast Saccharomyces cerevisiae.通过挖掘酿酒酵母基因组规模数据进行全球蛋白质功能注释。

Nucleic Acids Res. 2004 Dec 7;32(21):6414-24. doi: 10.1093/nar/gkh978. Print 2004.

A dynamic transcriptional network communicates growth potential to ribosome synthesis and critical cell size.一个动态转录网络将生长潜力传递给核糖体合成及关键细胞大小。

Genes Dev. 2004 Oct 15;18(20):2491-505. doi: 10.1101/gad.1228804. Epub 2004 Oct 1.

An integrated probabilistic model for functional prediction of proteins.一种用于蛋白质功能预测的综合概率模型。

J Comput Biol. 2004;11(2-3):463-75. doi: 10.1089/1066527041410346.

Whole-genome annotation by using evidence integration in functional-linkage networks.利用功能连锁网络中的证据整合进行全基因组注释。

Proc Natl Acad Sci U S A. 2004 Mar 2;101(9):2888-93. doi: 10.1073/pnas.0307326101. Epub 2004 Feb 23.

Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data.通过对高度异质的全基因组数据进行综合分析揭示酵母分子网络中的模块性和组织性。

Proc Natl Acad Sci U S A. 2004 Mar 2;101(9):2981-6. doi: 10.1073/pnas.0308661100. Epub 2004 Feb 18.

Assigning function to yeast proteins by integration of technologies.通过整合技术赋予酵母蛋白质功能。

Mol Cell. 2003 Dec;12(6):1353-65. doi: 10.1016/s1097-2765(03)00476-3.

MIPS: analysis and annotation of proteins from whole genomes.MIPS：全基因组蛋白质的分析与注释

Nucleic Acids Res. 2004 Jan 1;32(Database issue):D41-4. doi: 10.1093/nar/gkh092.

Global analysis of protein localization in budding yeast.芽殖酵母中蛋白质定位的全局分析。

Nature. 2003 Oct 16;425(6959):686-91. doi: 10.1038/nature02026.

A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae).一种用于组合异构数据源以进行基因功能预测（针对酿酒酵母）的贝叶斯框架。

Proc Natl Acad Sci U S A. 2003 Jul 8;100(14):8348-53. doi: 10.1073/pnas.0832373100. Epub 2003 Jun 25.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过基于证据整合的通用知识发现方法对蛋白质功能进行全基因组预测。

Genome wide prediction of protein function via a generic knowledge discovery approach based on evidence integration.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献