Suppr超能文献

使用植物特异性支持向量机进行DNA结合蛋白预测:一种新的基因组注释工具的验证与应用

DNA-binding protein prediction using plant specific support vector machines: validation and application of a new genome annotation tool.

作者信息

Motion Graham B, Howden Andrew J M, Huitema Edgar, Jones Susan

机构信息

Division of Plant Sciences, University of Dundee at the James Hutton Institute, Invergowrie, Dundee DD2 5DA, UK Cell and Molecular Sciences, James Hutton Institute, Invergowrie, Dundee DD2 5DA, UK.

Division of Plant Sciences, University of Dundee at the James Hutton Institute, Invergowrie, Dundee DD2 5DA, UK.

出版信息

Nucleic Acids Res. 2015 Dec 15;43(22):e158. doi: 10.1093/nar/gkv805. Epub 2015 Aug 24.

Abstract

There are currently 151 plants with draft genomes available but levels of functional annotation for putative protein products are low. Therefore, accurate computational predictions are essential to annotate genomes in the first instance, and to provide focus for the more costly and time consuming functional assays that follow. DNA-binding proteins are an important class of proteins that require annotation, but current computational methods are not applicable for genome wide predictions in plant species. Here, we explore the use of species and lineage specific models for the prediction of DNA-binding proteins in plants. We show that a species specific support vector machine model based on Arabidopsis sequence data is more accurate (accuracy 81%) than a generic model (74%), and based on this we develop a plant specific model for predicting DNA-binding proteins. We apply this model to the tomato proteome and demonstrate its ability to perform accurate high-throughput prediction of DNA-binding proteins. In doing so, we have annotated 36 currently uncharacterised proteins by assigning a putative DNA-binding function. Our model is publically available and we propose it be used in combination with existing tools to help increase annotation levels of DNA-binding proteins encoded in plant genomes.

摘要

目前有151种植物拥有草图基因组,但假定蛋白质产物的功能注释水平较低。因此,准确的计算预测对于首先注释基因组以及为后续更昂贵且耗时的功能测定提供重点至关重要。DNA结合蛋白是一类需要注释的重要蛋白质,但目前的计算方法不适用于植物物种的全基因组预测。在此,我们探索使用物种和谱系特异性模型来预测植物中的DNA结合蛋白。我们表明,基于拟南芥序列数据的物种特异性支持向量机模型比通用模型(74%)更准确(准确率81%),基于此我们开发了一种用于预测DNA结合蛋白的植物特异性模型。我们将此模型应用于番茄蛋白质组,并证明其能够对DNA结合蛋白进行准确的高通量预测。通过这样做,我们通过赋予假定的DNA结合功能注释了36种目前未表征的蛋白质。我们的模型已公开可用,我们建议将其与现有工具结合使用,以帮助提高植物基因组中编码的DNA结合蛋白的注释水平。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a450/4678848/33d59cbb720c/gkv805fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验