Suppr超能文献

通过基于证据整合的通用知识发现方法对蛋白质功能进行全基因组预测。

Genome wide prediction of protein function via a generic knowledge discovery approach based on evidence integration.

作者信息

Xiong Jianghui, Rayner Simon, Luo Kunyi, Li Yinghui, Chen Shanguang

机构信息

Laboratory of Space Cell and Molecular Biology, China Astronaut Research and Training Center, Beijing, PR, China.

出版信息

BMC Bioinformatics. 2006 May 25;7:268. doi: 10.1186/1471-2105-7-268.

Abstract

BACKGROUND

The automation of many common molecular biology techniques has resulted in the accumulation of vast quantities of experimental data. One of the major challenges now facing researchers is how to process this data to yield useful information about a biological system (e.g. knowledge of genes and their products, and the biological roles of proteins, their molecular functions, localizations and interaction networks). We present a technique called Global Mapping of Unknown Proteins (GMUP) which uses the Gene Ontology Index to relate diverse sources of experimental data by creation of an abstraction layer of evidence data. This abstraction layer is used as input to a neural network which, once trained, can be used to predict function from the evidence data of unannotated proteins. The method allows us to include almost any experimental data set related to protein function, which incorporates the Gene Ontology, to our evidence data in order to seek relationships between the different sets.

RESULTS

We have demonstrated the capabilities of this method in two ways. We first collected various experimental datasets associated with yeast (Saccharomyces cerevisiae) and applied the technique to a set of previously annotated open reading frames (ORFs). These ORFs were divided into training and test sets and were used to examine the accuracy of the predictions made by our method. Then we applied GMUP to previously un-annotated ORFs and made 1980, 836 and 1969 predictions corresponding to the GO Biological Process, Molecular Function and Cellular Component sub-categories respectively. We found that GMUP was particularly successful at predicting ORFs with functions associated with the ribonucleoprotein complex, protein metabolism and transportation.

CONCLUSION

This study presents a global and generic gene knowledge discovery approach based on evidence integration of various genome-scale data. It can be used to provide insight as to how certain biological processes are implemented by interaction and coordination of proteins, which may serve as a guide for future analysis. New data can be readily incorporated as it becomes available to provide more reliable predictions or further insights into processes and interactions.

摘要

背景

许多常见分子生物学技术的自动化导致了大量实验数据的积累。研究人员目前面临的主要挑战之一是如何处理这些数据,以获得有关生物系统的有用信息(例如基因及其产物的知识,以及蛋白质的生物学作用、分子功能、定位和相互作用网络)。我们提出了一种称为未知蛋白质全局映射(GMUP)的技术,该技术使用基因本体索引通过创建证据数据的抽象层来关联不同来源的实验数据。这个抽象层用作神经网络的输入,一旦经过训练,就可以用于从未注释蛋白质的证据数据中预测功能。该方法使我们能够将几乎任何与蛋白质功能相关的实验数据集(其中包含基因本体)纳入我们的证据数据中,以寻找不同数据集之间的关系。

结果

我们通过两种方式展示了该方法的能力。我们首先收集了与酵母(酿酒酵母)相关的各种实验数据集,并将该技术应用于一组先前注释的开放阅读框(ORF)。这些ORF被分为训练集和测试集,并用于检验我们方法所做预测的准确性。然后我们将GMUP应用于先前未注释的ORF,并分别对基因本体生物学过程、分子功能和细胞组分组进行了1980、836和1969次预测。我们发现GMUP在预测与核糖核蛋白复合物、蛋白质代谢和运输相关功能的ORF方面特别成功。

结论

本研究提出了一种基于各种基因组规模数据证据整合的全局通用基因知识发现方法。它可用于深入了解某些生物过程是如何通过蛋白质的相互作用和协调来实现的,这可能为未来的分析提供指导。新数据可用时可以很容易地纳入,以提供更可靠的预测或对过程和相互作用的进一步深入了解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a5d/1481625/10e2b31cde7f/1471-2105-7-268-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验