蛋白质数据库中蛋白质子序列二级结构倾向的信息量

Information quantity for secondary structure propensities of protein subsequences in the Protein Data Bank.

作者信息

Kondo Ryohei, Kasahara Kota, Takahashi Takuya

机构信息

Graduate School of Life Sciences, Ritsumeikan University, Kusatsu, Shiga 525-8577, Japan.

College of Life Sciences, Ritsumeikan University, Kusatsu, Shiga 525-8577, Japan.

出版信息

Biophys Physicobiol. 2022 Feb 8;19:1-12. doi: 10.2142/biophysico.bppb-v19.0002. eCollection 2022.

DOI:10.2142/biophysico.bppb-v19.0002

PMID:35532457

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8926306/

Abstract

Elucidating the principles of sequence-structure relationships of proteins is a long-standing issue in biology. The nature of a short segment of a protein is determined by both the subsequence of the segment itself and its environment. For example, a type of subsequence, the so-called chameleon sequences, can form different secondary structures depending on its environments. Chameleon sequences are considered to have a weak tendency to form a specific structure. Although many chameleon sequences have been identified, they are only a small part of all possible subsequences in the proteome. The strength of the tendency to take a specific structure for each subsequence has not been fully quantified. In this study, we comprehensively analyzed subsequences consisting of four to nine amino acid residues, or -gram (4≤≤9), observed in non-redundant sequences in the Protein Data Bank (PDB). Tendencies to form a specific structure in terms of the secondary structure and accessible surface area are quantified as information quantities for each . Although the majority of observed subsequences have low information quantity due to lack of samples in the current PDB, thousands of -grams with strong tendencies, including known structural motifs, were found. In addition, machine learning partially predicted the tendency of unknown -grams, and thus, this technique helps to extract knowledge from the limited number of samples in the PDB.

摘要

阐明蛋白质序列与结构关系的原理是生物学中一个长期存在的问题。蛋白质短片段的性质由该片段本身的子序列及其环境共同决定。例如，一种子序列，即所谓的变色龙序列，会根据其所处环境形成不同的二级结构。变色龙序列被认为形成特定结构的倾向较弱。尽管已经鉴定出许多变色龙序列，但它们只是蛋白质组中所有可能子序列的一小部分。每个子序列形成特定结构的倾向强度尚未得到充分量化。在本研究中，我们全面分析了蛋白质数据库（PDB）中无冗余序列中观察到的由4至9个氨基酸残基组成的子序列，即 -gram（4≤≤9）。根据二级结构和可及表面积形成特定结构的倾向被量化为每个的信息量。尽管由于当前PDB中样本不足，大多数观察到的子序列信息量较低，但仍发现了数千个具有强烈倾向的 -gram，包括已知的结构基序。此外，机器学习部分预测了未知 -gram的倾向，因此，该技术有助于从PDB中有限数量的样本中提取知识。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/472a/8926306/68275d11f87a/19_e190002-g001.jpg

相似文献

Information quantity for secondary structure propensities of protein subsequences in the Protein Data Bank.

Biophys Physicobiol. 2022 Feb 8;19:1-12. doi: 10.2142/biophysico.bppb-v19.0002. eCollection 2022.

Intrinsic disorder in the Protein Data Bank.

J Biomol Struct Dyn. 2007 Feb;24(4):325-42. doi: 10.1080/07391102.2007.10507123.

Certain heptapeptide and large sequences representing an entire helix, strand or coil conformation in proteins are associated as chameleon sequences.

Int J Biol Macromol. 2011 Aug 1;49(2):218-22. doi: 10.1016/j.ijbiomac.2011.04.017. Epub 2011 May 5.

Chameleon sequences in neurodegenerative diseases.

Biochem Biophys Res Commun. 2016 Mar 25;472(1):209-16. doi: 10.1016/j.bbrc.2016.01.187. Epub 2016 Feb 23.

NetCSSP: web application for predicting chameleon sequences and amyloid fibril formation.

Nucleic Acids Res. 2009 Jul;37(Web Server issue):W469-73. doi: 10.1093/nar/gkp351. Epub 2009 May 25.

Improving protein secondary structure prediction based on short subsequences with local structure similarity.

BMC Genomics. 2010 Dec 2;11 Suppl 4(Suppl 4):S4. doi: 10.1186/1471-2164-11-S4-S4.

Indications that "codon boundaries" are physico-chemically defined and that protein-folding information is contained in the redundant exon bases.

Theor Biol Med Model. 2006 Aug 7;3:28. doi: 10.1186/1742-4682-3-28.

[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].

Yi Chuan Xue Bao. 2004 May;31(5):431-43.

Structural diversity of sequentially identical subsequences of proteins: identical octapeptides can have different conformations.

Proteins. 1998 Feb 15;30(3):228-31. doi: 10.1002/(sici)1097-0134(19980215)30:3<228::aid-prot2>3.0.co;2-g.

What is the minimum number of residues to determine the secondary structural state?

J Protein Chem. 1999 Jul;18(5):579-84. doi: 10.1023/a:1020655417839.

本文引用的文献

Characteristics of interactions at protein segments without non-local intramolecular contacts in the Protein Data Bank.

PLoS One. 2018 Dec 11;13(12):e0205052. doi: 10.1371/journal.pone.0205052. eCollection 2018.

Single-sequence-based prediction of protein secondary structures and solvent accessibility by deep whole-sequence learning.

J Comput Chem. 2018 Oct 5;39(26):2210-2216. doi: 10.1002/jcc.25534. Epub 2018 Oct 14.

Sixty-five years of the long march in protein secondary structure prediction: the final stretch?

Brief Bioinform. 2018 May 1;19(3):482-494. doi: 10.1093/bib/bbw129.

Predicting secondary structures, contact numbers, and residue-wise contact orders of native protein structures from amino acid sequences using critical random networks.

Biophysics (Nagoya-shi). 2005 Nov 22;1:67-74. doi: 10.2142/biophysics.1.67. eCollection 2005.

50 years of amino acid hydrophobicity scales: revisiting the capacity for peptide classification.

Biol Res. 2016 Jul 4;49(1):31. doi: 10.1186/s40659-016-0092-5.

Novel function discovery through sequence and structural data mining.

Curr Opin Struct Biol. 2016 Jun;38:53-61. doi: 10.1016/j.sbi.2016.05.017. Epub 2016 Jun 10.

Highly accurate sequence-based prediction of half-sphere exposures of amino acid residues in proteins.

Bioinformatics. 2016 Mar 15;32(6):843-9. doi: 10.1093/bioinformatics/btv665. Epub 2015 Nov 14.

ChSeq: A database of chameleon sequences.

Protein Sci. 2015 Jul;24(7):1075-86. doi: 10.1002/pro.2689. Epub 2015 Jun 16.

DISOPRED3: precise disordered region predictions with annotated protein-binding activity.

Bioinformatics. 2015 Mar 15;31(6):857-63. doi: 10.1093/bioinformatics/btu744. Epub 2014 Nov 12.

An overview of recent advances in structural bioinformatics of protein-protein interactions and a guide to their principles.

Prog Biophys Mol Biol. 2014 Nov-Dec;116(2-3):141-50. doi: 10.1016/j.pbiomolbio.2014.07.004. Epub 2014 Jul 29.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

蛋白质数据库中蛋白质子序列二级结构倾向的信息量

Information quantity for secondary structure propensities of protein subsequences in the Protein Data Bank.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献