• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

成分偏差的详尽分析揭示了普遍存在的偏差区域:人类和果蝇功能关联分析

Exhaustive assignment of compositional bias reveals universally prevalent biased regions: analysis of functional associations in human and Drosophila.

作者信息

Harrison Paul M

机构信息

Dept. of Biology, McGill University, Stewart Biology Building, 1205 Dr, Penfield Ave, Montreal, QC, H3A 1B1, Canada.

出版信息

BMC Bioinformatics. 2006 Oct 10;7:441. doi: 10.1186/1471-2105-7-441.

DOI:10.1186/1471-2105-7-441
PMID:17032452
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1618407/
Abstract

BACKGROUND

Compositionally biased (CB) regions are stretches in protein sequences made from mainly a distinct subset of amino acid residues; such regions are frequently associated with a structural role in the cell, or with protein disorder.

RESULTS

We derived a procedure for the exhaustive assignment and classification of CB regions, and have applied it to thirteen metazoan proteomes. Sequences are initially scanned for the lowest-probability subsequences (LPSs) for single amino-acid types; subsequently, an exhaustive search for lowest probability subsequences (LPSs) for multiple residue types is performed iteratively until convergence, to define CB region boundaries. We analysed > 40,000 CB regions with > 20 million residues; strikingly, nine single-/double- residue biases are universally abundant, and are consistently highly ranked across both vertebrates and invertebrates. To home in subpopulations of CB regions of interest in human and D. melanogaster, we analysed CB region lengths, conservation, inferred functional categories and predicted protein disorder, and filtered for coiled coils and protein structures. In particular, we found that some of the universally abundant CB regions have significant associations to transcription and nuclear localization in Human and Drosophila, and are also predicted to be moderately or highly disordered. Focussing on Q-based biased regions, we found that these regions are typically only well conserved within mammals (appearing in 60-80% of orthologs), with shorter human transcription-related CB regions being unconserved outside of mammals; they are also preferentially linked to protein domains such as the homeodomain and glucocorticoid-receptor DNA-binding domain. In general, only approximately 40-50% of residues in these human and Drosophila CB regions have predicted protein disorder.

CONCLUSION

This data is of use for the further functional characterization of genes, and for structural genomics initiatives.

摘要

背景

成分偏向(CB)区域是蛋白质序列中主要由氨基酸残基的一个独特子集构成的片段;此类区域通常与细胞中的结构作用或蛋白质无序状态相关。

结果

我们推导了一种用于CB区域详尽分配和分类的程序,并将其应用于13个后生动物蛋白质组。首先扫描序列以寻找单个氨基酸类型的最低概率子序列(LPS);随后,对多种残基类型的最低概率子序列(LPS)进行迭代的详尽搜索,直至收敛,以定义CB区域边界。我们分析了超过40,000个CB区域,其中包含超过2000万个残基;引人注目的是,9种单/双残基偏向普遍大量存在,并且在脊椎动物和无脊椎动物中始终排名靠前。为了深入研究人类和黑腹果蝇中感兴趣的CB区域亚群,我们分析了CB区域长度、保守性、推断的功能类别和预测的蛋白质无序状态,并筛选了卷曲螺旋和蛋白质结构。特别是,我们发现一些普遍大量存在的CB区域与人类和果蝇中的转录和核定位有显著关联,并且还被预测为中度或高度无序。聚焦于基于Q的偏向区域,我们发现这些区域通常仅在哺乳动物中具有良好的保守性(出现在60 - 80%的直系同源物中),人类中较短的与转录相关的CB区域在哺乳动物之外不保守;它们还优先与诸如同源结构域和糖皮质激素受体DNA结合结构域等蛋白质结构域相关联。一般来说,这些人类和果蝇CB区域中只有大约40 - 50%的残基具有预测的蛋白质无序状态。

结论

这些数据可用于基因的进一步功能表征以及结构基因组学计划。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/adfc/1618407/b3cf83a7a08d/1471-2105-7-441-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/adfc/1618407/3c288bce35cf/1471-2105-7-441-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/adfc/1618407/055b8cdfaab9/1471-2105-7-441-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/adfc/1618407/5efc63574f76/1471-2105-7-441-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/adfc/1618407/00aa1bef4d76/1471-2105-7-441-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/adfc/1618407/6f7f9bfc2c80/1471-2105-7-441-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/adfc/1618407/3c211a75852d/1471-2105-7-441-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/adfc/1618407/b3cf83a7a08d/1471-2105-7-441-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/adfc/1618407/3c288bce35cf/1471-2105-7-441-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/adfc/1618407/055b8cdfaab9/1471-2105-7-441-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/adfc/1618407/5efc63574f76/1471-2105-7-441-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/adfc/1618407/00aa1bef4d76/1471-2105-7-441-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/adfc/1618407/6f7f9bfc2c80/1471-2105-7-441-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/adfc/1618407/3c211a75852d/1471-2105-7-441-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/adfc/1618407/b3cf83a7a08d/1471-2105-7-441-7.jpg

相似文献

1
Exhaustive assignment of compositional bias reveals universally prevalent biased regions: analysis of functional associations in human and Drosophila.成分偏差的详尽分析揭示了普遍存在的偏差区域:人类和果蝇功能关联分析
BMC Bioinformatics. 2006 Oct 10;7:441. doi: 10.1186/1471-2105-7-441.
2
fLPS: Fast discovery of compositional biases for the protein universe.fLPS:蛋白质宇宙组成偏差的快速发现
BMC Bioinformatics. 2017 Nov 13;18(1):476. doi: 10.1186/s12859-017-1906-3.
3
LPS-annotate: complete annotation of compositionally biased regions in the protein knowledgebase.LPS-annotate:对蛋白质知识库中组成性偏倚区域进行完整注释。
Database (Oxford). 2011 Jan 6;2011:baq031. doi: 10.1093/database/baq031. Print 2011.
4
BiasViz: visualization of amino acid biased regions in protein alignments.偏差可视化工具(BiasViz):蛋白质比对中氨基酸偏差区域的可视化。
Bioinformatics. 2007 Nov 15;23(22):3093-4. doi: 10.1093/bioinformatics/btm489. Epub 2007 Oct 6.
5
A method to assess compositional bias in biological sequences and its application to prion-like glutamine/asparagine-rich domains in eukaryotic proteomes.一种评估生物序列中组成性偏差的方法及其在真核生物蛋白质组中类朊病毒谷氨酰胺/天冬酰胺富集结构域的应用。
Genome Biol. 2003;4(6):R40. doi: 10.1186/gb-2003-4-6-r40. Epub 2003 May 30.
6
Large-scale analysis of influenza A virus sequences reveals potential drug target sites of non-structural proteins.甲型流感病毒序列的大规模分析揭示了非结构蛋白的潜在药物靶点位点。
J Gen Virol. 2009 Sep;90(Pt 9):2124-33. doi: 10.1099/vir.0.011270-0. Epub 2009 May 6.
7
Exploring charged biased regions in the human proteome.探索人类蛋白质组中的带电偏置区域。
Gene. 2013 Feb 25;515(2):277-80. doi: 10.1016/j.gene.2012.11.077. Epub 2012 Dec 20.
8
Comparative integromics on FZD7 orthologs: conserved binding sites for PU.1, SP1, CCAAT-box and TCF/LEF/SOX transcription factors within 5'-promoter region of mammalian FZD7 orthologs.FZD7直系同源物的比较整合基因组学:哺乳动物FZD7直系同源物5'-启动子区域内PU.1、SP1、CCAAT盒以及TCF/LEF/SOX转录因子的保守结合位点
Int J Mol Med. 2007 Mar;19(3):529-33.
9
A high-quality catalog of the Drosophila melanogaster proteome.一份高质量的黑腹果蝇蛋白质组目录。
Nat Biotechnol. 2007 May;25(5):576-83. doi: 10.1038/nbt1300. Epub 2007 Apr 22.
10
Blast sampling for structural and functional analyses.用于结构和功能分析的胚细胞采样。
BMC Bioinformatics. 2007 Feb 23;8:62. doi: 10.1186/1471-2105-8-62.

引用本文的文献

1
Identification of Low-Complexity Domains by Compositional Signatures Reveals Class-Specific Frequencies and Functions Across the Domains of Life.通过组成特征鉴定低复杂度结构域揭示了生命领域中特定类别出现的频率和功能。
PLoS Comput Biol. 2024 May 15;20(5):e1011372. doi: 10.1371/journal.pcbi.1011372. eCollection 2024 May.
2
Optimizing strategy for the discovery of compositionally-biased or low-complexity regions in proteins.优化发现蛋白质中组成偏向或低复杂度区域的策略。
Sci Rep. 2024 Jan 5;14(1):680. doi: 10.1038/s41598-023-50991-8.
3
One Step Closer to the Understanding of the Relationship IDR-LCR-Structure.

本文引用的文献

1
A novel sensitive method for the detection of user-defined compositional bias in biological sequences.一种用于检测生物序列中用户定义组成偏差的新型灵敏方法。
Bioinformatics. 2006 May 1;22(9):1055-63. doi: 10.1093/bioinformatics/btl049. Epub 2006 Feb 24.
2
Functional insights from the distribution and role of homopeptide repeat-containing proteins.含同肽重复序列蛋白的分布及作用所带来的功能见解
Genome Res. 2005 Apr;15(4):537-51. doi: 10.1101/gr.3096505.
3
Intrinsically unstructured proteins and their functions.内在无序蛋白质及其功能。
离理解 IDR-LCR-结构的关系又近了一步。
Genes (Basel). 2023 Aug 28;14(9):1711. doi: 10.3390/genes14091711.
4
Evolution of sequence traits of prion-like proteins linked to amyotrophic lateral sclerosis (ALS).与肌萎缩侧索硬化症(ALS)相关的朊病毒样蛋白序列特征的演变。
PeerJ. 2022 Nov 17;10:e14417. doi: 10.7717/peerj.14417. eCollection 2022.
5
IDPConformerGenerator: A Flexible Software Suite for Sampling the Conformational Space of Disordered Protein States.IDPConformerGenerator:用于采样无序蛋白质状态构象空间的灵活软件套件。
J Phys Chem A. 2022 Sep 8;126(35):5985-6003. doi: 10.1021/acs.jpca.2c03726. Epub 2022 Aug 28.
6
Expansion and functional analysis of the SR-related protein family across the domains of life.SR 相关蛋白家族在生命领域的扩张和功能分析。
RNA. 2022 Oct;28(10):1298-1314. doi: 10.1261/rna.079170.122. Epub 2022 Jul 21.
7
fLPS 2.0: rapid annotation of compositionally-biased regions in biological sequences.fLPS 2.0:生物序列中组成性偏向区域的快速注释
PeerJ. 2021 Oct 28;9:e12363. doi: 10.7717/peerj.12363. eCollection 2021.
8
Human tRNAs with inosine 34 are essential to efficiently translate eukarya-specific low-complexity proteins.含有肌苷34的人类转运RNA对于有效翻译真核生物特有的低复杂性蛋白质至关重要。
Nucleic Acids Res. 2021 Jul 9;49(12):7011-7034. doi: 10.1093/nar/gkab461.
9
LCD-Composer: an intuitive, composition-centric method enabling the identification and detailed functional mapping of low-complexity domains.LCD-Composer:一种直观的、以组成为中心的方法,可实现低复杂性结构域的识别和详细功能映射。
NAR Genom Bioinform. 2021 May 26;3(2):lqab048. doi: 10.1093/nargab/lqab048. eCollection 2021 Jun.
10
PolyQ length co-evolution in neural proteins.神经蛋白中多聚谷氨酰胺长度的共同进化
NAR Genom Bioinform. 2021 May 14;3(2):lqab032. doi: 10.1093/nargab/lqab032. eCollection 2021 Jun.
Nat Rev Mol Cell Biol. 2005 Mar;6(3):197-208. doi: 10.1038/nrm1589.
4
Natively unfolded proteins.天然未折叠蛋白
Curr Opin Struct Biol. 2005 Feb;15(1):35-41. doi: 10.1016/j.sbi.2005.01.002.
5
Combining prediction, computation and experiment for the characterization of protein disorder.结合预测、计算和实验对蛋白质无序状态进行表征。
Curr Opin Struct Biol. 2004 Oct;14(5):570-6. doi: 10.1016/j.sbi.2004.08.003.
6
Comparative analysis of amino acid repeats in rodents and humans.啮齿动物和人类中氨基酸重复序列的比较分析。
Genome Res. 2004 Apr;14(4):549-54. doi: 10.1101/gr.1925704.
7
MUSCLE: multiple sequence alignment with high accuracy and high throughput.MUSCLE:具有高精度和高吞吐量的多序列比对。
Nucleic Acids Res. 2004 Mar 19;32(5):1792-7. doi: 10.1093/nar/gkh340. Print 2004.
8
Prediction and functional analysis of native disorder in proteins from the three kingdoms of life.对来自生命三界的蛋白质中天然无序结构的预测与功能分析。
J Mol Biol. 2004 Mar 26;337(3):635-45. doi: 10.1016/j.jmb.2004.02.002.
9
The Gene Ontology (GO) database and informatics resource.基因本体论(GO)数据库及信息资源。
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D258-61. doi: 10.1093/nar/gkh036.
10
The ASTRAL Compendium in 2004.2004年的《星盘汇编》。
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D189-92. doi: 10.1093/nar/gkh034.