• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

对已鉴定蛋白质序列中氨基酸对频率的概率分析。

Probabilistic analysis of the frequencies of amino acid pairs within characterized protein sequences.

作者信息

Shen Shiyi, Kai Bo, Ruan Jishou, Torin Huzil J, Carpenter Eric, Tuszynski Jack A

机构信息

College of Mathematical Science and LPMC, Nankai University, Tianjin 300071, PR China.

Department of Oncology, Division of Experimental Oncology, Cross Cancer Institute, University of Alberta, 11560 University Avenue, Edmonton, Canada AB T6G 1Z2.

出版信息

Physica A. 2006 Oct 15;370(2):651-662. doi: 10.1016/j.physa.2006.03.004. Epub 2006 Apr 3.

DOI:10.1016/j.physa.2006.03.004
PMID:32288076
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7127678/
Abstract

Here, we describe a unique probabilistic evaluation of the 20, naturally occurring, amino acids and their distributions within the Swiss-Prot and Complete Human Genebank databases. We have developed a computational technique that imparts both directionality and length constraints into searches for unique combinations of amino acids within protein sequences. Using statistical approaches, we have carried out searches of all possible two- and three-residue motifs contained within these databases. This technique is based on the unusually high occurrence of a small number of these motifs when compared to the expected probability of finding a specific residue grouping within a given database. Subsequent filtering of this search to identify such unique combinations has provided several examples that can be used as markers to identify particular proteins within or across databases. We focus on three of these motifs, which were found to be of greatest interest to us. The CC, CM and a combination of the two, CCM motifs all occur either more or less frequently than would be predicted based on standard amino acid distributions within the entire human proteome.

摘要

在此,我们描述了对20种天然存在的氨基酸及其在Swiss-Prot和完整人类基因库数据库中的分布进行的独特概率评估。我们开发了一种计算技术,该技术在搜索蛋白质序列中氨基酸的独特组合时赋予方向性和长度限制。使用统计方法,我们对这些数据库中包含的所有可能的二残基和三残基基序进行了搜索。与在给定数据库中找到特定残基分组的预期概率相比,该技术基于少数这些基序的异常高出现率。对该搜索进行后续筛选以识别此类独特组合,提供了几个可用作标记物以识别数据库内或跨数据库的特定蛋白质的示例。我们专注于其中三个基序,发现它们对我们最具吸引力。CC、CM以及两者的组合CCM基序,其出现频率均高于或低于基于整个人类蛋白质组中标准氨基酸分布所预测的频率。

相似文献

1
Probabilistic analysis of the frequencies of amino acid pairs within characterized protein sequences.对已鉴定蛋白质序列中氨基酸对频率的概率分析。
Physica A. 2006 Oct 15;370(2):651-662. doi: 10.1016/j.physa.2006.03.004. Epub 2006 Apr 3.
2
Unsupervised statistical discovery of spaced motifs in prokaryotic genomes.原核生物基因组中间隔基序的无监督统计发现。
BMC Genomics. 2017 Jan 5;18(1):27. doi: 10.1186/s12864-016-3400-0.
3
Use of a Probabilistic Motif Search to Identify Histidine Phosphotransfer Domain-Containing Proteins.使用概率基序搜索来鉴定含组氨酸磷酸转移结构域的蛋白质。
PLoS One. 2016 Jan 11;11(1):e0146577. doi: 10.1371/journal.pone.0146577. eCollection 2016.
4
Statistical models for protein validation using tandem mass spectral data and protein amino acid sequence databases.使用串联质谱数据和蛋白质氨基酸序列数据库进行蛋白质验证的统计模型。
Anal Chem. 2004 Mar 15;76(6):1664-71. doi: 10.1021/ac035112y.
5
Effective protein sequence comparison.有效的蛋白质序列比较。
Methods Enzymol. 1996;266:227-58. doi: 10.1016/s0076-6879(96)66017-0.
6
An artificial intelligence approach to motif discovery in protein sequences: application to steriod dehydrogenases.一种用于蛋白质序列中基序发现的人工智能方法:在甾体脱氢酶中的应用。
J Steroid Biochem Mol Biol. 1997 May;62(1):29-44. doi: 10.1016/s0960-0760(97)00013-7.
7
Combination of Multiple Spectral Libraries Improves the Current Search Methods Used to Identify Missing Proteins in the Chromosome-Centric Human Proteome Project.多个光谱库的组合改进了当前用于在以染色体为中心的人类蛋白质组计划中识别缺失蛋白质的搜索方法。
J Proteome Res. 2015 Dec 4;14(12):4959-66. doi: 10.1021/acs.jproteome.5b00578. Epub 2015 Sep 14.
8
Multiple Diversity of Mitochondrial Cytochrome Amino Acid Sequences of the Same Length in Animals.动物中相同长度的线粒体细胞色素氨基酸序列的多重多样性。
Front Mol Biosci. 2020 Jun 17;7:102. doi: 10.3389/fmolb.2020.00102. eCollection 2020.
9
[Occurrence of motifs with six amino acid residues in three eukaryotic proteomes].[三种真核生物蛋白质组中六氨基酸残基基序的出现情况]
Mol Biol (Mosk). 2012 Jan-Feb;46(1):184-90.
10
Discovering structural correlations in alpha-helices.发现α螺旋中的结构相关性。
Protein Sci. 1994 Oct;3(10):1847-57. doi: 10.1002/pro.5560031024.

引用本文的文献

1
Targeted approach to determine the impact of cancer-associated protease variants.确定癌症相关蛋白酶变体影响的靶向方法。
Sci Adv. 2025 Feb 14;11(7):eadp5958. doi: 10.1126/sciadv.adp5958. Epub 2025 Feb 12.
2
Proteins Can Withstand More Extensive Labeling while Providing Accurate Structural Information in Covalent Labeling-Mass Spectrometry.蛋白质在共价标记-质谱法中可以承受更广泛的标记,同时提供准确的结构信息。
J Am Soc Mass Spectrom. 2024 May 1;35(5):1030-1039. doi: 10.1021/jasms.4c00043. Epub 2024 Apr 6.
3
Transfer learning to leverage larger datasets for improved prediction of protein stability changes.利用更大的数据集进行迁移学习,以提高蛋白质稳定性变化预测的准确性。
Proc Natl Acad Sci U S A. 2024 Feb 6;121(6):e2314853121. doi: 10.1073/pnas.2314853121. Epub 2024 Jan 29.
4
Transfer learning to leverage larger datasets for improved prediction of protein stability changes.迁移学习以利用更大的数据集来改进对蛋白质稳定性变化的预测。
bioRxiv. 2023 Jul 30:2023.07.27.550881. doi: 10.1101/2023.07.27.550881.
5
Neutral Models of De Novo Gene Emergence Suggest that Gene Evolution has a Preferred Trajectory.从头基因出现的中性模型表明基因进化具有一个优选轨迹。
Mol Biol Evol. 2023 Apr 4;40(4). doi: 10.1093/molbev/msad079.
6
Arrow of Time, Entropy, and Protein Folding: Holistic View on Biochirality.时间之箭、熵和蛋白质折叠:生物手性的整体观。
Int J Mol Sci. 2022 Mar 28;23(7):3687. doi: 10.3390/ijms23073687.
7
The Effect of the Protein Synthesis Entropy Reduction on the Cell Size Regulation and Division Size of Unicellular Organisms.蛋白质合成熵降低对单细胞生物细胞大小调控及分裂大小的影响。
Entropy (Basel). 2022 Jan 7;24(1):94. doi: 10.3390/e24010094.
8
Mild Acid Elution and MHC Immunoaffinity Chromatography Reveal Similar Albeit Not Identical Profiles of the HLA Class I Immunopeptidome.轻度酸洗脱和 MHC 免疫亲和色谱法揭示了 HLA I 类免疫肽组相似但不完全相同的特征。
J Proteome Res. 2021 Jan 1;20(1):289-304. doi: 10.1021/acs.jproteome.0c00386. Epub 2020 Nov 3.
9
In-depth interrogation of protein thermal unfolding data with MoltenProt.利用 MoltenProt 深入探究蛋白质热变性数据。
Protein Sci. 2021 Jan;30(1):201-217. doi: 10.1002/pro.3986. Epub 2020 Nov 21.
10
Self-organization and entropy reduction in a living cell.活细胞中的自组织与熵减
Biosystems. 2013 Jan;111(1):1-10. doi: 10.1016/j.biosystems.2012.10.005. Epub 2012 Nov 15.

本文引用的文献

1
Structures of deacylated tRNA mimics bound to the E site of the large ribosomal subunit.与大核糖体亚基E位点结合的去酰化tRNA模拟物的结构。
RNA. 2003 Nov;9(11):1345-52. doi: 10.1261/rna.5120503.
2
The Genome sequence of the SARS-associated coronavirus.与严重急性呼吸综合征相关的冠状病毒的基因组序列。
Science. 2003 May 30;300(5624):1399-404. doi: 10.1126/science.1085953. Epub 2003 May 1.
3
Identification of a novel coronavirus in patients with severe acute respiratory syndrome.在严重急性呼吸综合征患者中鉴定出一种新型冠状病毒。
N Engl J Med. 2003 May 15;348(20):1967-76. doi: 10.1056/NEJMoa030747. Epub 2003 Apr 10.
4
The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003.2003年的SWISS-PROT蛋白质知识库及其补充TrEMBL。
Nucleic Acids Res. 2003 Jan 1;31(1):365-70. doi: 10.1093/nar/gkg095.
5
Analysis of protein sequence/structure similarity relationships.蛋白质序列/结构相似性关系分析。
Biophys J. 2002 Nov;83(5):2781-91. doi: 10.1016/s0006-3495(02)75287-9.
6
Super pairwise alignment (SPA): an efficient approach to global alignment for homologous sequences.超双序列比对(SPA):一种用于同源序列全局比对的高效方法。
J Comput Biol. 2002;9(3):477-86. doi: 10.1089/106652702760138574.
7
Review: protein secondary structure prediction continues to rise.综述:蛋白质二级结构预测持续发展。
J Struct Biol. 2001 May-Jun;134(2-3):204-18. doi: 10.1006/jsbi.2001.4336.
8
Systematic and fully automated identification of protein sequence patterns.蛋白质序列模式的系统且完全自动化识别。
J Comput Biol. 2000;7(3-4):585-600. doi: 10.1089/106652700750050952.
9
Exploiting the past and the future in protein secondary structure prediction.在蛋白质二级结构预测中利用过去和未来信息
Bioinformatics. 1999 Nov;15(11):937-46. doi: 10.1093/bioinformatics/15.11.937.
10
Crystal structure of Thermotoga maritima ribosome recycling factor: a tRNA mimic.嗜热栖热菌核糖体循环因子的晶体结构:一种tRNA模拟物。
Science. 1999 Dec 17;286(5448):2349-52. doi: 10.1126/science.286.5448.2349.