Suppr超能文献

一种从酵母基因组数据预测蛋白质定位的 FPT 方法。

An FPT approach for predicting protein localization from yeast genomic data.

机构信息

State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun, Jilin, China.

出版信息

PLoS One. 2011 Jan 19;6(1):e14449. doi: 10.1371/journal.pone.0014449.

Abstract

Accurately predicting the localization of proteins is of paramount importance in the quest to determine their respective functions within the cellular compartment. Because of the continuous and rapid progress in the fields of genomics and proteomics, more data are available now than ever before. Coincidentally, data mining methods been developed and refined in order to handle this experimental windfall, thus allowing the scientific community to quantitatively address long-standing questions such as that of protein localization. Here, we develop a frequent pattern tree (FPT) approach to generate a minimum set of rules (mFPT) for predicting protein localization. We acquire a series of rules according to the features of yeast genomic data. The mFPT prediction accuracy is benchmarked against other commonly used methods such as Bayesian networks and logistic regression under various statistical measures. Our results show that mFPT gave better performance than other approaches in predicting protein localization. Meanwhile, setting 0.65 as the minimum hit-rate, we obtained 138 proteins that mFPT predicted differently than the simple naive bayesian method (SNB). In our analysis of these 138 proteins, we present novel predictions for the location for 17 proteins, which currently do not have any defined localization. These predictions can serve as putative annotations and should provide preliminary clues for experimentalists. We also compared our predictions against the eukaryotic subcellular localization database and related predictions by others on protein localization. Our method is quite generalized and can thus be applied to discover the underlying rules for protein-protein interactions, genomic interactions, and structure-function relationships, as well as those of other fields of research.

摘要

准确预测蛋白质的定位在确定其在细胞区室中的各自功能方面至关重要。由于基因组学和蛋白质组学领域的持续快速发展,现在可用的数据比以往任何时候都多。巧合的是,已经开发和完善了数据挖掘方法,以便处理这种实验上的收获,从而使科学界能够定量解决长期存在的问题,例如蛋白质定位问题。在这里,我们开发了一种频繁模式树 (FPT) 方法来生成一组用于预测蛋白质定位的最小规则 (mFPT)。我们根据酵母基因组数据的特征获取了一系列规则。在各种统计措施下,将 mFPT 预测准确性与贝叶斯网络和逻辑回归等常用方法进行基准测试。我们的结果表明,mFPT 在预测蛋白质定位方面的性能优于其他方法。同时,将最小命中率设置为 0.65,我们得到了 138 种蛋白质,mFPT 的预测结果与简单贝叶斯方法 (SNB) 不同。在对这 138 种蛋白质的分析中,我们对 17 种蛋白质的位置提出了新的预测,这些蛋白质目前没有任何定义的定位。这些预测可以作为假定的注释,并为实验人员提供初步线索。我们还将我们的预测与真核亚细胞定位数据库以及其他蛋白质定位相关预测进行了比较。我们的方法非常通用,因此可用于发现蛋白质-蛋白质相互作用、基因组相互作用和结构-功能关系以及其他研究领域的潜在规则。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/935d/3023707/cfc31eb3ddb2/pone.0014449.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验