将蛋白质分子二元分类为内在无序片段和有序片段。

Binary classification of protein molecules into intrinsically disordered and ordered segments.

作者信息

Fukuchi Satoshi, Hosoda Kazuo, Homma Keiichi, Gojobori Takashi, Nishikawa Ken

机构信息

Center for Information Biology & DNA Data Bank of Japan, National Institute of Genetics, Yata 1111, Mishima, Shizuoka 411-8540, Japan.

出版信息

BMC Struct Biol. 2011 Jun 22;11:29. doi: 10.1186/1472-6807-11-29.

DOI:10.1186/1472-6807-11-29

PMID:21693062

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3199747/

Abstract

BACKGROUND

Although structural domains in proteins (SDs) are important, half of the regions in the human proteome are currently left with no SD assignments. These unassigned regions consist not only of novel SDs, but also of intrinsically disordered (ID) regions since proteins, especially those in eukaryotes, generally contain a significant fraction of ID regions. As ID regions can be inferred from amino acid sequences, a method that combines SD and ID region assignments can determine the fractions of SDs and ID regions in any proteome.

RESULTS

In contrast to other available ID prediction programs that merely identify likely ID regions, the DICHOT system we previously developed classifies the entire protein sequence into SDs and ID regions. Application of DICHOT to the human proteome revealed that residue-wise ID regions constitute 35%, SDs with similarity to PDB structures comprise 52%, while SDs with no similarity to PDB structures account for the remaining 13%. The last group consists of novel structural domains, termed cryptic domains, which serve as good targets of structural genomics. The DICHOT method applied to the proteomes of other model organisms indicated that eukaryotes generally have high ID contents, while prokaryotes do not. In human proteins, ID contents differ among subcellular localizations: nuclear proteins had the highest residue-wise ID fraction (47%), while mitochondrial proteins exhibited the lowest (13%). Phosphorylation and O-linked glycosylation sites were found to be located preferentially in ID regions. As O-linked glycans are attached to residues in the extracellular regions of proteins, the modification is likely to protect the ID regions from proteolytic cleavage in the extracellular environment. Alternative splicing events tend to occur more frequently in ID regions. We interpret this as evidence that natural selection is operating at the protein level in alternative splicing.

CONCLUSIONS

We classified entire regions of proteins into the two categories, SDs and ID regions and thereby obtained various kinds of complete genome-wide statistics. The results of the present study are important basic information for understanding protein structural architectures and have been made publicly available at http://spock.genes.nig.ac.jp/~genome/DICHOT.

摘要

背景

尽管蛋白质中的结构域很重要，但目前人类蛋白质组中有一半区域尚无结构域归属。这些未归属区域不仅包含新的结构域，还包括内在无序区域，因为蛋白质，尤其是真核生物中的蛋白质，通常含有相当比例的内在无序区域。由于可以从氨基酸序列推断出内在无序区域，一种结合结构域和内在无序区域归属的方法能够确定任何蛋白质组中结构域和内在无序区域的比例。

结果

与其他仅识别可能的内在无序区域的现有内在无序预测程序不同，我们之前开发的DICHOT系统将整个蛋白质序列分类为结构域和内在无序区域。将DICHOT应用于人类蛋白质组发现，逐残基的内在无序区域占35%，与蛋白质数据银行（PDB）结构相似的结构域占52%，而与PDB结构无相似性的结构域占其余的13%。最后一组包括新的结构域，称为隐秘结构域，它们是结构基因组学的良好目标。将DICHOT方法应用于其他模式生物的蛋白质组表明，真核生物通常具有较高的内在无序含量，而原核生物则不然。在人类蛋白质中，内在无序含量在亚细胞定位之间存在差异：核蛋白的逐残基内在无序比例最高（47%），而线粒体蛋白的比例最低（13%）。发现磷酸化和O-连接糖基化位点优先位于内在无序区域。由于O-连接聚糖附着在蛋白质细胞外区域的残基上，这种修饰可能保护内在无序区域在细胞外环境中不被蛋白水解切割。可变剪接事件往往更频繁地发生在内在无序区域。我们将此解释为自然选择在可变剪接的蛋白质水平上起作用的证据。

结论

我们将蛋白质的整个区域分为结构域和内在无序区域两类，从而获得了各种全基因组的完整统计数据。本研究结果是理解蛋白质结构架构的重要基础信息，已在http://spock.genes.nig.ac.jp/~genome/DICHOT上公开提供。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e8d7/3199747/64ed9fd6e113/1472-6807-11-29-1.jpg

相似文献

Binary classification of protein molecules into intrinsically disordered and ordered segments.将蛋白质分子二元分类为内在无序片段和有序片段。

BMC Struct Biol. 2011 Jun 22;11:29. doi: 10.1186/1472-6807-11-29.

Development of an accurate classification system of proteins into structured and unstructured regions that uncovers novel structural domains: its application to human transcription factors.开发一种将蛋白质准确分类为结构化和非结构化区域的系统，该系统可揭示新的结构域：其在人类转录因子中的应用。

BMC Struct Biol. 2009 Apr 30;9:26. doi: 10.1186/1472-6807-9-26.

The unfoldomics decade: an update on intrinsically disordered proteins.未折叠组学十年：内在无序蛋白质的最新进展

BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S1. doi: 10.1186/1471-2164-9-S2-S1.

Intrinsic disorder in the Protein Data Bank.蛋白质数据库中的内在无序状态。

J Biomol Struct Dyn. 2007 Feb;24(4):325-42. doi: 10.1080/07391102.2007.10507123.

How Common Is Disorder? Occurrence of Disordered Residues in Four Domains of Life.该病症有多常见？生命四个领域中无序残基的出现情况。

Int J Mol Sci. 2015 Aug 18;16(8):19490-507. doi: 10.3390/ijms160819490.

Intrinsically disordered regions have specific functions in mitochondrial and nuclear proteins.内在无序区域在线粒体蛋白和核蛋白中具有特定功能。

Mol Biosyst. 2012 Jan;8(1):247-55. doi: 10.1039/c1mb05208j. Epub 2011 Aug 24.

Library of disordered patterns in 3D protein structures.三维蛋白质结构中的无序模式文库。

PLoS Comput Biol. 2010 Oct 14;6(10):e1000958. doi: 10.1371/journal.pcbi.1000958.

The GTOP database in 2009: updated content and novel features to expand and deepen insights into protein structures and functions.2009年的GTOP数据库：更新内容及新特性，以拓展和深化对蛋白质结构与功能的认识。

Nucleic Acids Res. 2009 Jan;37(Database issue):D333-7. doi: 10.1093/nar/gkn855. Epub 2008 Nov 4.

Human transcription factors contain a high fraction of intrinsically disordered regions essential for transcriptional regulation.人类转录因子包含很大一部分对于转录调控至关重要的内在无序区域。

J Mol Biol. 2006 Jun 16;359(4):1137-49. doi: 10.1016/j.jmb.2006.04.016. Epub 2006 Apr 25.

Intrinsically disordered proteins in human mitochondria.人线粒体中的无序蛋白

Genes Cells. 2012 Oct;17(10):817-25. doi: 10.1111/gtc.12000. Epub 2012 Aug 22.

引用本文的文献

Distinct domains of ENHANCER OF PINOID hold information for its polarization required for auxin-mediated cotyledon and flower development in Arabidopsis.拟南芥中PIN类蛋白增强子的不同结构域保存着其在生长素介导的子叶和花发育过程中极化所需的信息。

PLoS Genet. 2025 Jun 23;21(6):e1011217. doi: 10.1371/journal.pgen.1011217. eCollection 2025 Jun.

A quantitative intracellular peptide-binding assay reveals recognition determinants and context dependence of short linear motifs.一种定量细胞内肽结合测定法揭示了短线性基序的识别决定因素和上下文依赖性。

J Biol Chem. 2025 Mar;301(3):108225. doi: 10.1016/j.jbc.2025.108225. Epub 2025 Jan 24.

A quantitative intracellular peptide binding assay reveals recognition determinants and context dependence of short linear motifs.一种定量细胞内肽结合测定法揭示了短线性基序的识别决定因素和上下文依赖性。

bioRxiv. 2024 Nov 1:2024.10.30.621084. doi: 10.1101/2024.10.30.621084.

Dynamic action of an intrinsically disordered protein in DNA compaction that induces mycobacterial dormancy.一种固有无序蛋白在 DNA 紧缩中的动态作用，诱导分枝杆菌休眠。

Nucleic Acids Res. 2024 Jan 25;52(2):816-830. doi: 10.1093/nar/gkad1149.

Exon Elongation Added Intrinsically Disordered Regions to the Encoded Proteins and Facilitated the Emergence of the Last Eukaryotic Common Ancestor.外显子延伸为编码蛋白添加了固有无序区域，并促进了最后一个真核生物共同祖先的出现。

Mol Biol Evol. 2023 Jan 4;40(1). doi: 10.1093/molbev/msac272.

OTUD1 deubiquitinase regulates NF-κB- and KEAP1-mediated inflammatory responses and reactive oxygen species-associated cell death pathways.OTUD1 去泛素化酶调节 NF-κB 和 KEAP1 介导的炎症反应及活性氧相关的细胞死亡途径。

Cell Death Dis. 2022 Aug 8;13(8):694. doi: 10.1038/s41419-022-05145-5.

Information quantity for secondary structure propensities of protein subsequences in the Protein Data Bank.蛋白质数据库中蛋白质子序列二级结构倾向的信息量

Biophys Physicobiol. 2022 Feb 8;19:1-12. doi: 10.2142/biophysico.bppb-v19.0002. eCollection 2022.

Potential of rescue and reactivation of tumor suppressor p53 for cancer therapy.肿瘤抑制因子p53的挽救与激活在癌症治疗中的潜力。

Biophys Rev. 2022 Jan 11;14(1):267-275. doi: 10.1007/s12551-021-00915-5. eCollection 2022 Feb.

On the roles of intrinsically disordered proteins and regions in cell communication and signaling.在细胞通讯和信号转导中，无规则卷曲蛋白和区域的作用。

Cell Commun Signal. 2021 Aug 30;19(1):88. doi: 10.1186/s12964-021-00774-3.

The N-terminal region of Jaw1 has a role to inhibit the formation of organized smooth endoplasmic reticulum as an intrinsically disordered region.Jaw1 的 N 端区域作为一个无序区域，具有抑制有组织的光滑内质网形成的作用。

Sci Rep. 2021 Jan 12;11(1):753. doi: 10.1038/s41598-020-80258-5.

本文引用的文献

Natively unfolded proteins: An overview.天然未折叠蛋白：综述。

Biophysics (Nagoya-shi). 2009 Oct 21;5:53-58. doi: 10.2142/biophysics.5.53. eCollection 2009.

Computational prediction of O-linked glycosylation sites that preferentially map on intrinsically disordered regions of extracellular proteins.O-连接糖基化位点的计算预测，这些位点优先定位在细胞外蛋白质的内在无序区域。

Int J Mol Sci. 2010;11(12):4991-5008. doi: 10.3390/ijms11124991. Epub 2010 Dec 3.

Understanding protein non-folding.理解蛋白质的非折叠状态。

Biochim Biophys Acta. 2010 Jun;1804(6):1231-64. doi: 10.1016/j.bbapap.2010.01.017. Epub 2010 Feb 1.

The Pfam protein families database.Pfam 蛋白质家族数据库。

Nucleic Acids Res. 2010 Jan;38(Database issue):D211-22. doi: 10.1093/nar/gkp985. Epub 2009 Nov 17.

The Universal Protein Resource (UniProt) in 2010.2010 年的通用蛋白质资源（UniProt）。

Nucleic Acids Res. 2010 Jan;38(Database issue):D142-8. doi: 10.1093/nar/gkp846. Epub 2009 Oct 20.

Stochastic noise in splicing machinery.剪接机制中的随机噪声。

Nucleic Acids Res. 2009 Aug;37(14):4873-86. doi: 10.1093/nar/gkp471. Epub 2009 Jun 22.

Structural implication of splicing stochastics.剪接随机性的结构影响

Nucleic Acids Res. 2009 Aug;37(14):4862-72. doi: 10.1093/nar/gkp444. Epub 2009 Jun 15.

BMC Struct Biol. 2009 Apr 30;9:26. doi: 10.1186/1472-6807-9-26.

Close encounters of the third kind: disordered domains and the interactions of proteins.第三类亲密接触：无序结构域与蛋白质相互作用

Bioessays. 2009 Mar;31(3):328-35. doi: 10.1002/bies.200800151.

Nucleic Acids Res. 2009 Jan;37(Database issue):D333-7. doi: 10.1093/nar/gkn855. Epub 2008 Nov 4.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

将蛋白质分子二元分类为内在无序片段和有序片段。

Binary classification of protein molecules into intrinsically disordered and ordered segments.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献