Suppr超能文献

成分偏差的详尽分析揭示了普遍存在的偏差区域:人类和果蝇功能关联分析

Exhaustive assignment of compositional bias reveals universally prevalent biased regions: analysis of functional associations in human and Drosophila.

作者信息

Harrison Paul M

机构信息

Dept. of Biology, McGill University, Stewart Biology Building, 1205 Dr, Penfield Ave, Montreal, QC, H3A 1B1, Canada.

出版信息

BMC Bioinformatics. 2006 Oct 10;7:441. doi: 10.1186/1471-2105-7-441.

Abstract

BACKGROUND

Compositionally biased (CB) regions are stretches in protein sequences made from mainly a distinct subset of amino acid residues; such regions are frequently associated with a structural role in the cell, or with protein disorder.

RESULTS

We derived a procedure for the exhaustive assignment and classification of CB regions, and have applied it to thirteen metazoan proteomes. Sequences are initially scanned for the lowest-probability subsequences (LPSs) for single amino-acid types; subsequently, an exhaustive search for lowest probability subsequences (LPSs) for multiple residue types is performed iteratively until convergence, to define CB region boundaries. We analysed > 40,000 CB regions with > 20 million residues; strikingly, nine single-/double- residue biases are universally abundant, and are consistently highly ranked across both vertebrates and invertebrates. To home in subpopulations of CB regions of interest in human and D. melanogaster, we analysed CB region lengths, conservation, inferred functional categories and predicted protein disorder, and filtered for coiled coils and protein structures. In particular, we found that some of the universally abundant CB regions have significant associations to transcription and nuclear localization in Human and Drosophila, and are also predicted to be moderately or highly disordered. Focussing on Q-based biased regions, we found that these regions are typically only well conserved within mammals (appearing in 60-80% of orthologs), with shorter human transcription-related CB regions being unconserved outside of mammals; they are also preferentially linked to protein domains such as the homeodomain and glucocorticoid-receptor DNA-binding domain. In general, only approximately 40-50% of residues in these human and Drosophila CB regions have predicted protein disorder.

CONCLUSION

This data is of use for the further functional characterization of genes, and for structural genomics initiatives.

摘要

背景

成分偏向(CB)区域是蛋白质序列中主要由氨基酸残基的一个独特子集构成的片段;此类区域通常与细胞中的结构作用或蛋白质无序状态相关。

结果

我们推导了一种用于CB区域详尽分配和分类的程序,并将其应用于13个后生动物蛋白质组。首先扫描序列以寻找单个氨基酸类型的最低概率子序列(LPS);随后,对多种残基类型的最低概率子序列(LPS)进行迭代的详尽搜索,直至收敛,以定义CB区域边界。我们分析了超过40,000个CB区域,其中包含超过2000万个残基;引人注目的是,9种单/双残基偏向普遍大量存在,并且在脊椎动物和无脊椎动物中始终排名靠前。为了深入研究人类和黑腹果蝇中感兴趣的CB区域亚群,我们分析了CB区域长度、保守性、推断的功能类别和预测的蛋白质无序状态,并筛选了卷曲螺旋和蛋白质结构。特别是,我们发现一些普遍大量存在的CB区域与人类和果蝇中的转录和核定位有显著关联,并且还被预测为中度或高度无序。聚焦于基于Q的偏向区域,我们发现这些区域通常仅在哺乳动物中具有良好的保守性(出现在60 - 80%的直系同源物中),人类中较短的与转录相关的CB区域在哺乳动物之外不保守;它们还优先与诸如同源结构域和糖皮质激素受体DNA结合结构域等蛋白质结构域相关联。一般来说,这些人类和果蝇CB区域中只有大约40 - 50%的残基具有预测的蛋白质无序状态。

结论

这些数据可用于基因的进一步功能表征以及结构基因组学计划。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/adfc/1618407/3c288bce35cf/1471-2105-7-441-1.jpg

相似文献

2
fLPS: Fast discovery of compositional biases for the protein universe.
BMC Bioinformatics. 2017 Nov 13;18(1):476. doi: 10.1186/s12859-017-1906-3.
3
LPS-annotate: complete annotation of compositionally biased regions in the protein knowledgebase.
Database (Oxford). 2011 Jan 6;2011:baq031. doi: 10.1093/database/baq031. Print 2011.
4
BiasViz: visualization of amino acid biased regions in protein alignments.
Bioinformatics. 2007 Nov 15;23(22):3093-4. doi: 10.1093/bioinformatics/btm489. Epub 2007 Oct 6.
6
Large-scale analysis of influenza A virus sequences reveals potential drug target sites of non-structural proteins.
J Gen Virol. 2009 Sep;90(Pt 9):2124-33. doi: 10.1099/vir.0.011270-0. Epub 2009 May 6.
7
Exploring charged biased regions in the human proteome.
Gene. 2013 Feb 25;515(2):277-80. doi: 10.1016/j.gene.2012.11.077. Epub 2012 Dec 20.
9
A high-quality catalog of the Drosophila melanogaster proteome.
Nat Biotechnol. 2007 May;25(5):576-83. doi: 10.1038/nbt1300. Epub 2007 Apr 22.
10
Blast sampling for structural and functional analyses.
BMC Bioinformatics. 2007 Feb 23;8:62. doi: 10.1186/1471-2105-8-62.

引用本文的文献

1
Identification of Low-Complexity Domains by Compositional Signatures Reveals Class-Specific Frequencies and Functions Across the Domains of Life.
PLoS Comput Biol. 2024 May 15;20(5):e1011372. doi: 10.1371/journal.pcbi.1011372. eCollection 2024 May.
3
One Step Closer to the Understanding of the Relationship IDR-LCR-Structure.
Genes (Basel). 2023 Aug 28;14(9):1711. doi: 10.3390/genes14091711.
4
Evolution of sequence traits of prion-like proteins linked to amyotrophic lateral sclerosis (ALS).
PeerJ. 2022 Nov 17;10:e14417. doi: 10.7717/peerj.14417. eCollection 2022.
5
IDPConformerGenerator: A Flexible Software Suite for Sampling the Conformational Space of Disordered Protein States.
J Phys Chem A. 2022 Sep 8;126(35):5985-6003. doi: 10.1021/acs.jpca.2c03726. Epub 2022 Aug 28.
6
Expansion and functional analysis of the SR-related protein family across the domains of life.
RNA. 2022 Oct;28(10):1298-1314. doi: 10.1261/rna.079170.122. Epub 2022 Jul 21.
7
fLPS 2.0: rapid annotation of compositionally-biased regions in biological sequences.
PeerJ. 2021 Oct 28;9:e12363. doi: 10.7717/peerj.12363. eCollection 2021.
8
Human tRNAs with inosine 34 are essential to efficiently translate eukarya-specific low-complexity proteins.
Nucleic Acids Res. 2021 Jul 9;49(12):7011-7034. doi: 10.1093/nar/gkab461.
9
LCD-Composer: an intuitive, composition-centric method enabling the identification and detailed functional mapping of low-complexity domains.
NAR Genom Bioinform. 2021 May 26;3(2):lqab048. doi: 10.1093/nargab/lqab048. eCollection 2021 Jun.
10
PolyQ length co-evolution in neural proteins.
NAR Genom Bioinform. 2021 May 14;3(2):lqab032. doi: 10.1093/nargab/lqab032. eCollection 2021 Jun.

本文引用的文献

1
A novel sensitive method for the detection of user-defined compositional bias in biological sequences.
Bioinformatics. 2006 May 1;22(9):1055-63. doi: 10.1093/bioinformatics/btl049. Epub 2006 Feb 24.
3
Intrinsically unstructured proteins and their functions.
Nat Rev Mol Cell Biol. 2005 Mar;6(3):197-208. doi: 10.1038/nrm1589.
4
Natively unfolded proteins.
Curr Opin Struct Biol. 2005 Feb;15(1):35-41. doi: 10.1016/j.sbi.2005.01.002.
5
Combining prediction, computation and experiment for the characterization of protein disorder.
Curr Opin Struct Biol. 2004 Oct;14(5):570-6. doi: 10.1016/j.sbi.2004.08.003.
6
Comparative analysis of amino acid repeats in rodents and humans.
Genome Res. 2004 Apr;14(4):549-54. doi: 10.1101/gr.1925704.
7
MUSCLE: multiple sequence alignment with high accuracy and high throughput.
Nucleic Acids Res. 2004 Mar 19;32(5):1792-7. doi: 10.1093/nar/gkh340. Print 2004.
8
Prediction and functional analysis of native disorder in proteins from the three kingdoms of life.
J Mol Biol. 2004 Mar 26;337(3):635-45. doi: 10.1016/j.jmb.2004.02.002.
9
The Gene Ontology (GO) database and informatics resource.
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D258-61. doi: 10.1093/nar/gkh036.
10
The ASTRAL Compendium in 2004.
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D189-92. doi: 10.1093/nar/gkh034.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验