Suppr超能文献

蛋白质中的大多数部分结构域是比对和注释伪迹。

Most partial domains in proteins are alignment and annotation artifacts.

作者信息

Triant Deborah A, Pearson William R

机构信息

Department of Biochemistry and Molecular Genetics, University of Virginia, Box 800733, Charlottesville, VA, 22908, USA.

出版信息

Genome Biol. 2015 May 15;16(1):99. doi: 10.1186/s13059-015-0656-7.

Abstract

BACKGROUND

Protein domains are commonly used to assess the functional roles and evolutionary relationships of proteins and protein families. Here, we use the Pfam protein family database to examine a set of candidate partial domains. Pfam protein domains are often thought of as evolutionarily indivisible, structurally compact, units from which larger functional proteins are assembled; however, almost 4% of Pfam27 PfamA domains are shorter than 50% of their family model length, suggesting that more than half of the domain is missing at those locations. To better understand the structural nature of partial domains in proteins, we examined 30,961 partial domain regions from 136 domain families contained in a representative subset of PfamA domains (RefProtDom2 or RPD2).

RESULTS

We characterized three types of apparent partial domains: split domains, bounded partials, and unbounded partials. We find that bounded partial domains are over-represented in eukaryotes and in lower quality protein predictions, suggesting that they often result from inaccurate genome assemblies or gene models. We also find that a large percentage of unbounded partial domains produce long alignments, which suggests that their annotation as a partial is an alignment artifact; yet some can be found as partials in other sequence contexts.

CONCLUSIONS

Partial domains are largely the result of alignment and annotation artifacts and should be viewed with caution. The presence of partial domain annotations in proteins should raise the concern that the prediction of the protein's gene may be incomplete. In general, protein domains can be considered the structural building blocks of proteins.

摘要

背景

蛋白质结构域常用于评估蛋白质及蛋白质家族的功能作用和进化关系。在此,我们使用Pfam蛋白质家族数据库来研究一组候选部分结构域。Pfam蛋白质结构域通常被认为是进化上不可分割、结构紧凑的单元,较大的功能蛋白质由这些单元组装而成;然而,Pfam27的PfamA结构域中近4%的长度不到其家族模型长度的50%,这表明在这些位置超过一半的结构域缺失。为了更好地理解蛋白质中部分结构域的结构本质,我们研究了PfamA结构域代表性子集(RefProtDom2或RPD2)中136个结构域家族的30,961个部分结构域区域。

结果

我们鉴定出三种明显的部分结构域类型:分裂结构域、有界部分结构域和无界部分结构域。我们发现有界部分结构域在真核生物和质量较低的蛋白质预测中过度存在,这表明它们通常是由不准确的基因组组装或基因模型导致的。我们还发现很大比例的无界部分结构域产生长比对,这表明将它们注释为部分结构域是一种比对假象;然而,有些在其他序列背景下可以作为部分结构域被发现。

结论

部分结构域很大程度上是比对和注释假象的结果,应谨慎看待。蛋白质中存在部分结构域注释应引发对该蛋白质基因预测可能不完整的担忧。一般来说,蛋白质结构域可被视为蛋白质的结构构建块。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验