Suppr超能文献

FGsub:根据一级结构预测的禾谷镰刀菌蛋白质亚细胞定位

FGsub: Fusarium graminearum protein subcellular localizations predicted from primary structures.

作者信息

Sun Chenglei, Zhao Xing-Ming, Tang Weihua, Chen Luonan

机构信息

Institute of Systems Biology, Shanghai University, Shanghai, China.

出版信息

BMC Syst Biol. 2010 Sep 13;4 Suppl 2(Suppl 2):S12. doi: 10.1186/1752-0509-4-S2-S12.

Abstract

BACKGROUND

The fungal pathogen Fusarium graminearum (telomorph Gibberella zeae) is the causal agent of several destructive crop diseases, where a set of genes usually work in concert to cause diseases to crops. To function appropriately, the F. graminearum proteins inside one cell should be assigned to different compartments, i.e. subcellular localizations. Therefore, the subcellular localizations of F. graminearum proteins can provide insights into protein functions and pathogenic mechanisms of this destructive pathogen fungus. Unfortunately, there are no subcellular localization information for F. graminearum proteins available now. Computational approaches provide an alternative way to predicting F. graminearum protein subcellular localizations due to the expensive and time-consuming biological experiments in lab.

RESULTS

In this paper, we developed a novel predictor, namely FGsub, to predict F. graminearum protein subcellular localizations from the primary structures. First, a non-redundant fungi data set with subcellular localization annotation is collected from UniProtKB database and used as training set, where the subcellular locations are classified into 10 groups. Subsequently, Support Vector Machine (SVM) is trained on the training set and used to predict F. graminearum protein subcellular localizations for those proteins that do not have significant sequence similarity to those in training set. The performance of SVMs on training set with 10-fold cross-validation demonstrates the efficiency and effectiveness of the proposed method. In addition, for F. graminearum proteins that have significant sequence similarity to those in training set, BLAST is utilized to transfer annotations of homologous proteins to uncharacterized F. graminearum proteins so that the F. graminearum proteins are annotated more comprehensively.

CONCLUSIONS

In this work, we present FGsub to predict F. graminearum protein subcellular localizations in a comprehensive manner. We make four fold contributions to this filed. First, we present a new algorithm to cope with imbalance problem that arises in protein subcellular localization prediction, which can solve imbalance problem and avoid false positive results. Second, we design an ensemble classifier which employs feature selection to further improve prediction accuracy. Third, we use BLAST to complement machine learning based methods, which enlarges our prediction coverage. Last and most important, we predict the subcellular localizations of 12786 F. graminearum proteins, which provide insights into protein functions and pathogenic mechanisms of this destructive pathogen fungus.

摘要

背景

真菌病原体禾谷镰刀菌(有性型为玉蜀黍赤霉)是几种具有破坏性的作物病害的病原体,一组基因通常协同作用导致作物发病。为了正常发挥功能,一个细胞内的禾谷镰刀菌蛋白质应被分配到不同的区室,即亚细胞定位。因此,禾谷镰刀菌蛋白质的亚细胞定位可以为这种具有破坏性的病原真菌的蛋白质功能和致病机制提供见解。不幸的是,目前尚无禾谷镰刀菌蛋白质的亚细胞定位信息。由于实验室中生物学实验成本高且耗时,计算方法为预测禾谷镰刀菌蛋白质亚细胞定位提供了一种替代方法。

结果

在本文中,我们开发了一种新的预测器,即FGsub,用于从一级结构预测禾谷镰刀菌蛋白质的亚细胞定位。首先,从UniProtKB数据库收集了一个具有亚细胞定位注释的非冗余真菌数据集并用作训练集,其中亚细胞位置被分为10组。随后,在训练集上训练支持向量机(SVM),并用于预测与训练集中的蛋白质没有显著序列相似性的禾谷镰刀菌蛋白质的亚细胞定位。SVM在训练集上进行10折交叉验证的性能证明了所提出方法的效率和有效性。此外,对于与训练集中的蛋白质具有显著序列相似性的禾谷镰刀菌蛋白质,利用BLAST将同源蛋白质的注释转移到未表征的禾谷镰刀菌蛋白质上,从而更全面地注释禾谷镰刀菌蛋白质。

结论

在这项工作中,我们提出FGsub以全面预测禾谷镰刀菌蛋白质的亚细胞定位。我们在该领域做出了四点贡献。第一,我们提出了一种新算法来应对蛋白质亚细胞定位预测中出现的不平衡问题,该算法可以解决不平衡问题并避免假阳性结果。第二,我们设计了一种集成分类器,该分类器采用特征选择来进一步提高预测准确性。第三,我们使用BLAST来补充基于机器学习的方法,这扩大了我们的预测覆盖范围。最后也是最重要的,我们预测了12786种禾谷镰刀菌蛋白质的亚细胞定位,这为这种具有破坏性的病原真菌的蛋白质功能和致病机制提供了见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a108/2982686/df5279f2e4b1/1752-0509-4-S2-S12-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验