• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

序列屏蔽算法的比较与有偏蛋白序列区域的检测

Comparison of sequence masking algorithms and the detection of biased protein sequence regions.

作者信息

Kreil David P, Ouzounis Christos A

机构信息

Department of Genetics/Inference Group (Cavendish Laboratory), University of Cambridge, Cambridge, UK.

出版信息

Bioinformatics. 2003 Sep 1;19(13):1672-81. doi: 10.1093/bioinformatics/btg212.

DOI:10.1093/bioinformatics/btg212
PMID:12967964
Abstract

MOTIVATION

Separation of protein sequence regions according to their local information complexity and subsequent masking of low complexity regions has greatly enhanced the reliability of function prediction by sequence similarity. Comparisons with alternative methods that focus on compositional sequence bias rather than information complexity measures have shown that removal of compositional bias yields at least as sensitive and much more specific results. Besides the application of sequence masking algorithms to sequence similarity searches, the study of the masked regions themselves is of great interest. Traditionally, however, these have been neglected despite evidence of their functional relevance.

RESULTS

Here we demonstrate that compositional bias seems to be a more effective measure for the detection of biologically meaningful signals. Typical results on proteins are compared to results for sequences that have been randomized in various ways, conserving composition and local correlations for individual proteins or the entire set. It is remarkable that low-complexity regions have the same form of distribution in proteins as in randomized sequences, and that the signal from randomized sequences with conserved local correlations and amino acid composition almost matches the signal from proteins. This is not the case for sequence bias, which hence seems to be a genuinely biological phenomenon in contrast to patches of low complexity.

摘要

动机

根据蛋白质序列区域的局部信息复杂性进行分离,并随后对低复杂性区域进行屏蔽,极大地提高了通过序列相似性进行功能预测的可靠性。与关注序列组成偏差而非信息复杂性度量的其他方法进行比较表明,消除组成偏差至少能产生同样敏感且更具特异性的结果。除了将序列屏蔽算法应用于序列相似性搜索外,对屏蔽区域本身的研究也非常有趣。然而,传统上这些区域一直被忽视,尽管有证据表明它们具有功能相关性。

结果

在这里我们证明,组成偏差似乎是检测生物学上有意义信号的更有效指标。将蛋白质的典型结果与以各种方式随机化的序列的结果进行比较,同时保留单个蛋白质或整个集合的组成和局部相关性。值得注意的是,低复杂性区域在蛋白质中的分布形式与在随机序列中的相同,并且具有保守局部相关性和氨基酸组成的随机序列的信号几乎与蛋白质的信号匹配。序列偏差并非如此,因此与低复杂性片段相比,它似乎是一种真正的生物学现象。

相似文献

1
Comparison of sequence masking algorithms and the detection of biased protein sequence regions.序列屏蔽算法的比较与有偏蛋白序列区域的检测
Bioinformatics. 2003 Sep 1;19(13):1672-81. doi: 10.1093/bioinformatics/btg212.
2
A new algorithm for detecting low-complexity regions in protein sequences.一种用于检测蛋白质序列中低复杂性区域的新算法。
Bioinformatics. 2005 Jan 15;21(2):160-70. doi: 10.1093/bioinformatics/bth497. Epub 2004 Aug 27.
3
SATCHMO: sequence alignment and tree construction using hidden Markov models.SATCHMO:使用隐马尔可夫模型进行序列比对和树构建。
Bioinformatics. 2003 Jul 22;19(11):1404-11. doi: 10.1093/bioinformatics/btg158.
4
Consensus alignment for reliable framework prediction in homology modeling.同源建模中用于可靠框架预测的一致性比对。
Bioinformatics. 2003 Sep 1;19(13):1682-91. doi: 10.1093/bioinformatics/btg211.
5
A new similarity measure among protein sequences.一种蛋白质序列间新的相似性度量方法。
Proc IEEE Comput Soc Bioinform Conf. 2003;2:347-52.
6
Sensitive pattern discovery with 'fuzzy' alignments of distantly related proteins.通过远缘相关蛋白质的“模糊”比对进行敏感模式发现。
Bioinformatics. 2003;19 Suppl 1:i130-7. doi: 10.1093/bioinformatics/btg1017.
7
Probabilistic scoring measures for profile-profile comparison yield more accurate short seed alignments.用于序列轮廓-序列轮廓比较的概率评分方法能产生更准确的短种子比对。
Bioinformatics. 2003 Aug 12;19(12):1531-9. doi: 10.1093/bioinformatics/btg185.
8
Clustering of amino acids for protein secondary structure prediction.用于蛋白质二级结构预测的氨基酸聚类
J Bioinform Comput Biol. 2004 Jun;2(2):333-42. doi: 10.1142/s0219720004000582.
9
Functional proteomics with biolinguistic methods. n-grams deliver sensitive portrayals of gene similarity.基于生物语言学方法的功能蛋白质组学。词元提供了对基因相似性的敏感描述。
IEEE Eng Med Biol Mag. 2005 May-Jun;24(3):73-80. doi: 10.1109/memb.2005.1436463.
10
Protein homology detection using string alignment kernels.使用字符串比对核进行蛋白质同源性检测。
Bioinformatics. 2004 Jul 22;20(11):1682-9. doi: 10.1093/bioinformatics/bth141. Epub 2004 Feb 26.

引用本文的文献

1
Pseudomonas aeruginosa core metabolism exerts a widespread growth-independent control on virulence.铜绿假单胞菌核心代谢对毒力施加广泛的、与生长非依赖的控制作用。
Sci Rep. 2020 Jun 11;10(1):9505. doi: 10.1038/s41598-020-66194-4.
2
Low complexity regions in the proteins of prokaryotes perform important functional roles and are highly conserved.原核生物蛋白质中的低复杂度区域具有重要的功能作用,并高度保守。
Nucleic Acids Res. 2019 Nov 4;47(19):9998-10009. doi: 10.1093/nar/gkz730.
3
Disentangling the complexity of low complexity proteins.解析低复杂度蛋白质的复杂性。
Brief Bioinform. 2020 Mar 23;21(2):458-472. doi: 10.1093/bib/bbz007.
4
Comparative functional analysis of proteins containing low-complexity predicted amyloid regions.含有低复杂性预测淀粉样区域的蛋白质的比较功能分析
PeerJ. 2018 Oct 30;6:e5823. doi: 10.7717/peerj.5823. eCollection 2018.
5
An analysis of single amino acid repeats as use case for application specific background models.分析单氨基酸重复序列作为特定应用背景模型的应用案例。
BMC Bioinformatics. 2011 May 19;12:173. doi: 10.1186/1471-2105-12-173.
6
New cytochrome P450 1B1, 1C2 and 1D1 genes in the killifish Fundulus heteroclitus: Basal expression and response of five killifish CYP1s to the AHR agonist PCB126.青鳉鱼(Fundulus heteroclitus)中的新型细胞色素P450 1B1、1C2和1D1基因:五种青鳉鱼细胞色素P450的基础表达及对芳烃受体激动剂多氯联苯126的反应
Aquat Toxicol. 2009 Jul 26;93(4):234-43. doi: 10.1016/j.aquatox.2009.05.008. Epub 2009 May 15.
7
Environmental sensing and response genes in cnidaria: the chemical defensome in the sea anemone Nematostella vectensis.刺胞动物门中的环境感知与反应基因:星状海葵的化学防御组
Cell Biol Toxicol. 2008 Dec;24(6):483-502. doi: 10.1007/s10565-008-9107-5. Epub 2008 Oct 28.