Suppr超能文献

基于酵母基因聚类的功能预测优势及其与非序列分类的相关性。

The advantage of functional prediction based on clustering of yeast genes and its correlation with non-sequence based classifications.

作者信息

Bilu Yonatan, Linial Michal

机构信息

Institute of Computer Sciences, Life Science Institute, The Hebrew University, Jerusalem 91904, Israel.

出版信息

J Comput Biol. 2002;9(2):193-210. doi: 10.1089/10665270252935412.

Abstract

Sequence similarity is probably the most widely used tool to infer functional linkage between proteins. The fully sequenced, much researched, genome of Saccharomyces cerevisiae gives us on opportunity to compare and statistically quantify computational methods based on sequence similarity, which aim to detect such linkage. In addition, the amount of data regarding Saccharomyces Cerevisiae genes and proteins, which is not directly based on sequence is rapidly increasing. Consequently, it allows investigation of the connections and correlation between classification based on these types of data and that based solely on sequence similarity. In this work we start with a simple clustering algorithm to cluster genes based on the BLAST E-score of their similarity. We analyze how well one can infer function from these clusters and for how many of the genes that are currently unknown one can suggest a prediction. Given these parameters, we show that even a simple algorithm achieves better results than simply considering the BLAST output of matching genes. In the second part of the paper, we show that there is a highly significant correlation (p-value < 10(-4) for the vast majority of the experiments) between the aforementioned clusters and other types of classifications. Namely, we show that a pair of genes being clustered together is correlated with these genes having similar expression patterns in DNA array experiments and with the encoded proteins being involved in protein-protein interactions. Although this correlation is highly significant, it is, of course, not strong enough to be, by itself, a tool for predicting co-regulation of genes or interaction of proteins. We discuss possible explanations for this correlation. Furthermore, the statistical evaluation of these results should be considered when developing tools that are aimed at making such predictions.

摘要

序列相似性可能是用于推断蛋白质之间功能联系的最广泛使用的工具。酿酒酵母的全基因组已被测序且经过大量研究,这使我们有机会比较并以统计方式量化基于序列相似性的计算方法,这些方法旨在检测这种联系。此外,关于酿酒酵母基因和蛋白质的并非直接基于序列的数据量正在迅速增加。因此,这使得我们能够研究基于这些数据类型的分类与仅基于序列相似性的分类之间的联系和相关性。在这项工作中,我们首先从一个简单的聚类算法开始,根据基因相似性的BLAST E值对基因进行聚类。我们分析从这些聚类中推断功能的效果如何,以及对于目前未知功能的基因中有多少可以做出预测。基于这些参数,我们表明即使是一个简单的算法也能取得比仅仅考虑匹配基因的BLAST输出更好的结果。在论文的第二部分,我们表明上述聚类与其他类型的分类之间存在高度显著的相关性(绝大多数实验的p值<10^(-4))。具体而言,我们表明聚在一起的一对基因与这些基因在DNA阵列实验中具有相似的表达模式以及与编码的蛋白质参与蛋白质 - 蛋白质相互作用相关。尽管这种相关性非常显著,但它本身当然还不足以成为预测基因共调控或蛋白质相互作用的工具。我们讨论了这种相关性可能的解释。此外,在开发旨在进行此类预测的工具时,应考虑对这些结果的统计评估。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验