蛋白质序列中含有两个氨基酸的区域：从同聚物重复迈向低复杂性格局的一步。

Regions with two amino acids in protein sequences: A step forward from homorepeats into the low complexity landscape.

作者信息

Mier Pablo, Andrade-Navarro Miguel A

机构信息

Institute of Organismic and Molecular Evolution, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany.

出版信息

Comput Struct Biotechnol J. 2022 Sep 18;20:5516-5523. doi: 10.1016/j.csbj.2022.09.011. eCollection 2022.

DOI:10.1016/j.csbj.2022.09.011

PMID:36249567

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9550522/

Abstract

Low complexity regions (LCRs) differ in amino acid composition from the background provided by the corresponding proteomes. The simplest LCRs are homorepeats (or polyX), regions composed of mostly-one amino acid type. Extensive research has been done to characterize homorepeats, and their taxonomic, functional and structural features depend on the amino acid type and sequence context. From them, the next step towards the study of LCRs are the regions composed of two types of amino acids, which we call polyXY. We classify polyXY in three categories based on the arrangement of the two amino acid types 'X' and 'Y': direpeats (e.g. 'XYXYXY'), joined (e.g. 'XXXYYY') and shuffled (e.g. 'XYYXXY'). We developed a script to search for polyXY, and located them in a comprehensive set of 20,340 reference proteomes. These results are available in a dedicated web server called XYs, in which the user can also submit their own protein datasets to detect polyXY. We studied the distribution of polyXY types by amino acid pair XY and category, and show that polyXY in Eukaryota are mainly located within intrinsically disordered regions. Our study provides a first step towards the characterization of polyXY as protein motifs.

摘要

低复杂度区域（LCRs）在氨基酸组成上与相应蛋白质组提供的背景不同。最简单的LCRs是同聚物重复序列（或多聚X），即主要由一种氨基酸类型组成的区域。人们已经对同聚物重复序列进行了广泛研究，其分类学、功能和结构特征取决于氨基酸类型和序列背景。在此基础上，研究LCRs的下一步是研究由两种氨基酸组成的区域，我们将其称为多聚XY。我们根据两种氨基酸类型“X”和“Y”的排列方式将多聚XY分为三类：直接重复序列（如“XYXYXY”）、连接重复序列（如“XXXYYY”）和洗牌重复序列（如“XYYXXY”）。我们开发了一个脚本来搜索多聚XY，并在一组包含20340个参考蛋白质组的综合数据集中定位它们。这些结果可在一个名为XYs的专用网络服务器上获取，用户也可以在该服务器上提交自己的蛋白质数据集以检测多聚XY。我们通过氨基酸对XY和类别研究了多聚XY类型的分布，并表明真核生物中的多聚XY主要位于内在无序区域。我们的研究为将多聚XY表征为蛋白质基序迈出了第一步。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/081b/9550522/8054427a78b7/ga1.jpg

相似文献

Regions with two amino acids in protein sequences: A step forward from homorepeats into the low complexity landscape.

Comput Struct Biotechnol J. 2022 Sep 18;20:5516-5523. doi: 10.1016/j.csbj.2022.09.011. eCollection 2022.

The nucleotide landscape of polyXY regions.

Comput Struct Biotechnol J. 2023 Oct 31;21:5408-5412. doi: 10.1016/j.csbj.2023.10.054. eCollection 2023.

Low Complexity Induces Structure in Protein Regions Predicted as Intrinsically Disordered.

Biomolecules. 2022 Aug 10;12(8):1098. doi: 10.3390/biom12081098.

One Step Closer to the Understanding of the Relationship IDR-LCR-Structure.

Genes (Basel). 2023 Aug 28;14(9):1711. doi: 10.3390/genes14091711.

PolyX2: Fast Detection of Homorepeats in Large Protein Datasets.

Genes (Basel). 2022 Apr 25;13(5):758. doi: 10.3390/genes13050758.

Assessing the low complexity of protein sequences via the low complexity triangle.

PLoS One. 2020 Dec 30;15(12):e0239154. doi: 10.1371/journal.pone.0239154. eCollection 2020.

Occurrence of disordered patterns and homorepeats in eukaryotic and bacterial proteomes.

Mol Biosyst. 2012 Jan;8(1):327-37. doi: 10.1039/c1mb05318c. Epub 2011 Oct 18.

Context characterization of amino acid homorepeats using evolution, position, and order.

Proteins. 2017 Apr;85(4):709-719. doi: 10.1002/prot.25250. Epub 2017 Feb 6.

HRaP: database of occurrence of HomoRepeats and patterns in proteomes.

Nucleic Acids Res. 2014 Jan;42(Database issue):D273-8. doi: 10.1093/nar/gkt927. Epub 2013 Oct 22.

The Conservation of Low Complexity Regions in Bacterial Proteins Depends on the Pathogenicity of the Strain and Subcellular Location of the Protein.

Genes (Basel). 2021 Mar 22;12(3):451. doi: 10.3390/genes12030451.

引用本文的文献

Identification of Low-Complexity Domains by Compositional Signatures Reveals Class-Specific Frequencies and Functions Across the Domains of Life.

PLoS Comput Biol. 2024 May 15;20(5):e1011372. doi: 10.1371/journal.pcbi.1011372. eCollection 2024 May.

The nucleotide landscape of polyXY regions.

Comput Struct Biotechnol J. 2023 Oct 31;21:5408-5412. doi: 10.1016/j.csbj.2023.10.054. eCollection 2023.

One Step Closer to the Understanding of the Relationship IDR-LCR-Structure.

Genes (Basel). 2023 Aug 28;14(9):1711. doi: 10.3390/genes14091711.

Phase separating Rho: a widespread regulatory function of disordered regions in proteins revealed in bacteria.

Signal Transduct Target Ther. 2023 Jun 21;8(1):253. doi: 10.1038/s41392-023-01505-5.

本文引用的文献

Low Complexity Induces Structure in Protein Regions Predicted as Intrinsically Disordered.

Biomolecules. 2022 Aug 10;12(8):1098. doi: 10.3390/biom12081098.

AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models.

Nucleic Acids Res. 2022 Jan 7;50(D1):D439-D444. doi: 10.1093/nar/gkab1061.

Between Interactions and Aggregates: The PolyQ Balance.

Genome Biol Evol. 2021 Nov 5;13(11). doi: 10.1093/gbe/evab246.

LCD-Composer: an intuitive, composition-centric method enabling the identification and detailed functional mapping of low-complexity domains.

NAR Genom Bioinform. 2021 May 26;3(2):lqab048. doi: 10.1093/nargab/lqab048. eCollection 2021 Jun.

UniProt: the universal protein knowledgebase in 2021.

Nucleic Acids Res. 2021 Jan 8;49(D1):D480-D489. doi: 10.1093/nar/gkaa1100.

The InterPro protein families and domains database: 20 years on.

Nucleic Acids Res. 2021 Jan 8;49(D1):D344-D354. doi: 10.1093/nar/gkaa977.

Pfam: The protein families database in 2021.

Nucleic Acids Res. 2021 Jan 8;49(D1):D412-D419. doi: 10.1093/nar/gkaa913.

UCSF ChimeraX: Structure visualization for researchers, educators, and developers.

Protein Sci. 2021 Jan;30(1):70-82. doi: 10.1002/pro.3943. Epub 2020 Oct 22.

Flanking Regions Determine the Structure of the Poly-Glutamine in Huntingtin through Mechanisms Common among Glutamine-Rich Human Proteins.

Structure. 2020 Jul 7;28(7):733-746.e5. doi: 10.1016/j.str.2020.04.008. Epub 2020 May 12.

Disordered Residues and Patterns in the Protein Data Bank.

Molecules. 2020 Mar 27;25(7):1522. doi: 10.3390/molecules25071522.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

蛋白质序列中含有两个氨基酸的区域：从同聚物重复迈向低复杂性格局的一步。

Regions with two amino acids in protein sequences: A step forward from homorepeats into the low complexity landscape.

作者信息

Mier Pablo, Andrade-Navarro Miguel A

机构信息

Institute of Organismic and Molecular Evolution, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany.

出版信息

Comput Struct Biotechnol J. 2022 Sep 18;20:5516-5523. doi: 10.1016/j.csbj.2022.09.011. eCollection 2022.

DOI:10.1016/j.csbj.2022.09.011

PMID:36249567

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9550522/

Abstract

摘要

蛋白质序列中含有两个氨基酸的区域：从同聚物重复迈向低复杂性格局的一步。

Regions with two amino acids in protein sequences: A step forward from homorepeats into the low complexity landscape.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

蛋白质序列中含有两个氨基酸的区域：从同聚物重复迈向低复杂性格局的一步。

Regions with two amino acids in protein sequences: A step forward from homorepeats into the low complexity landscape.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献