• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

分析单氨基酸重复序列作为特定应用背景模型的应用案例。

An analysis of single amino acid repeats as use case for application specific background models.

机构信息

Chair of Bioinformatics, Boku University Vienna, Muthgasse 18, 1190 Vienna, Austria.

出版信息

BMC Bioinformatics. 2011 May 19;12:173. doi: 10.1186/1471-2105-12-173.

DOI:10.1186/1471-2105-12-173
PMID:21595908
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3124433/
Abstract

BACKGROUND

Sequence analysis aims to identify biologically relevant signals against a backdrop of functionally meaningless variation. Increasingly, it is recognized that the quality of the background model directly affects the performance of analyses. State-of-the-art approaches rely on classical sequence models that are adapted to the studied dataset. Although performing well in the analysis of globular protein domains, these models break down in regions of stronger compositional bias or low complexity. While these regions are typically filtered, there is increasing anecdotal evidence of functional roles. This motivates an exploration of more complex sequence models and application-specific approaches for the investigation of biased regions.

RESULTS

Traditional Markov-chains and application-specific regression models are compared using the example of predicting runs of single amino acids, a particularly simple class of biased regions. Cross-fold validation experiments reveal that the alternative regression models capture the multi-variate trends well, despite their low dimensionality and in contrast even to higher-order Markov-predictors. We show how the significance of unusual observations can be computed for such empirical models. The power of a dedicated model in the detection of biologically interesting signals is then demonstrated in an analysis identifying the unexpected enrichment of contiguous leucine-repeats in signal-peptides. Considering different reference sets, we show how the question examined actually defines what constitutes the 'background'. Results can thus be highly sensitive to the choice of appropriate model training sets. Conversely, the choice of reference data determines the questions that can be investigated in an analysis.

CONCLUSIONS

Using a specific case of studying biased regions as an example, we have demonstrated that the construction of application-specific background models is both necessary and feasible in a challenging sequence analysis situation.

摘要

背景

序列分析旨在识别生物学相关信号,同时排除功能上无意义的变异。越来越多的人认识到,背景模型的质量直接影响分析的性能。最先进的方法依赖于经典的序列模型,这些模型适用于所研究的数据集。虽然在球状蛋白结构域的分析中表现良好,但这些模型在组成性偏差较大或复杂度较低的区域失效。虽然这些区域通常会被过滤掉,但越来越多的轶事证据表明它们具有功能作用。这促使我们探索更复杂的序列模型和特定于应用的方法,以研究有偏差的区域。

结果

使用预测单个氨基酸连续出现的例子,比较了传统的马尔可夫链和特定于应用的回归模型,这是一类特别简单的有偏差区域。交叉验证实验表明,尽管替代回归模型的维度较低,甚至与高阶马尔可夫预测器相比,它们仍能很好地捕捉多变量趋势。我们展示了如何为这种经验模型计算异常观测的显著性。然后,在一个分析中,我们展示了一个专门的模型在检测生物有趣信号方面的强大功能,该分析确定了信号肽中连续亮氨酸重复的意外富集。考虑到不同的参考集,我们展示了所研究的问题实际上如何定义“背景”。因此,结果对适当模型训练集的选择高度敏感。相反,参考数据的选择决定了在分析中可以研究的问题。

结论

以研究有偏差区域的具体案例为例,我们已经证明,在具有挑战性的序列分析情况下,构建特定于应用的背景模型不仅是必要的,而且是可行的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a5f1/3124433/3df40f34ead2/1471-2105-12-173-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a5f1/3124433/2437168a7be6/1471-2105-12-173-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a5f1/3124433/85464707c030/1471-2105-12-173-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a5f1/3124433/4865edb4094b/1471-2105-12-173-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a5f1/3124433/3df40f34ead2/1471-2105-12-173-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a5f1/3124433/2437168a7be6/1471-2105-12-173-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a5f1/3124433/85464707c030/1471-2105-12-173-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a5f1/3124433/4865edb4094b/1471-2105-12-173-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a5f1/3124433/3df40f34ead2/1471-2105-12-173-4.jpg

相似文献

1
An analysis of single amino acid repeats as use case for application specific background models.分析单氨基酸重复序列作为特定应用背景模型的应用案例。
BMC Bioinformatics. 2011 May 19;12:173. doi: 10.1186/1471-2105-12-173.
2
Single amino acid repeats in signal peptides.信号肽中的单个氨基酸重复。
FEBS J. 2010 Aug;277(15):3147-57. doi: 10.1111/j.1742-4658.2010.07720.x. Epub 2010 Jun 17.
3
A structure filter for the Eukaryotic Linear Motif Resource.真核线性基序资源的结构过滤器。
BMC Bioinformatics. 2009 Oct 24;10:351. doi: 10.1186/1471-2105-10-351.
4
SATCHMO: sequence alignment and tree construction using hidden Markov models.SATCHMO:使用隐马尔可夫模型进行序列比对和树构建。
Bioinformatics. 2003 Jul 22;19(11):1404-11. doi: 10.1093/bioinformatics/btg158.
5
Fitting hidden Markov models of protein domains to a target species: application to Plasmodium falciparum.将蛋白质结构域的隐马尔可夫模型拟合到目标物种上:在疟原虫中的应用。
BMC Bioinformatics. 2012 May 1;13:67. doi: 10.1186/1471-2105-13-67.
6
Protein Repeats from First Principles.基于第一性原理的蛋白质重复序列
Sci Rep. 2016 Apr 5;6:23959. doi: 10.1038/srep23959.
7
Comparison of sequence masking algorithms and the detection of biased protein sequence regions.序列屏蔽算法的比较与有偏蛋白序列区域的检测
Bioinformatics. 2003 Sep 1;19(13):1672-81. doi: 10.1093/bioinformatics/btg212.
8
Socket: a program for identifying and analysing coiled-coil motifs within protein structures.Socket:一个用于识别和分析蛋白质结构中卷曲螺旋基序的程序。
J Mol Biol. 2001 Apr 13;307(5):1427-50. doi: 10.1006/jmbi.2001.4545.
9
Regulatory motif finding by logic regression.通过逻辑回归进行调控基序发现。
Bioinformatics. 2004 Nov 1;20(16):2799-811. doi: 10.1093/bioinformatics/bth333. Epub 2004 May 27.
10
The genome of Campylobacter jejuni: codon and amino acid usage.空肠弯曲菌的基因组:密码子和氨基酸使用情况
APMIS. 2003 Jun;111(6):605-18. doi: 10.1034/j.1600-0463.2003.1110603.x.

引用本文的文献

1
Disentangling the complexity of low complexity proteins.解析低复杂度蛋白质的复杂性。
Brief Bioinform. 2020 Mar 23;21(2):458-472. doi: 10.1093/bib/bbz007.

本文引用的文献

1
FACT: functional annotation transfer between proteins with similar feature architectures.事实:具有相似特征结构的蛋白质之间的功能注释转移。
BMC Bioinformatics. 2010 Aug 9;11:417. doi: 10.1186/1471-2105-11-417.
2
Single amino acid repeats in signal peptides.信号肽中的单个氨基酸重复。
FEBS J. 2010 Aug;277(15):3147-57. doi: 10.1111/j.1742-4658.2010.07720.x. Epub 2010 Jun 17.
3
Signal peptides are allosteric activators of the protein translocase.信号肽是蛋白质转运酶的变构激活剂。
Nature. 2009 Nov 19;462(7271):363-7. doi: 10.1038/nature08559.
4
Spatial positions of homopolymeric repeats in the human proteome and their effect on cellular toxicity.人类蛋白质组中同聚物重复序列的空间位置及其对细胞毒性的影响。
Biochem Biophys Res Commun. 2009 Mar 6;380(2):382-6. doi: 10.1016/j.bbrc.2009.01.101. Epub 2009 Jan 23.
5
Polyglutamine gene function and dysfunction in the ageing brain.多聚谷氨酰胺基因在衰老大脑中的功能与功能障碍
Biochim Biophys Acta. 2008 Aug;1779(8):507-21. doi: 10.1016/j.bbagrm.2008.05.008. Epub 2008 Jun 5.
6
Tandem repeats in human disorders: mechanisms and evolution.人类疾病中的串联重复序列:机制与进化
Front Biosci. 2008 May 1;13:4467-84. doi: 10.2741/3017.
7
RSAT: regulatory sequence analysis tools.RSAT:调控序列分析工具。
Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W119-27. doi: 10.1093/nar/gkn304. Epub 2008 May 21.
8
Accurate statistical model of comparison between multiple sequence alignments.多序列比对之间比较的精确统计模型。
Nucleic Acids Res. 2008 Apr;36(7):2240-8. doi: 10.1093/nar/gkn065. Epub 2008 Feb 19.
9
The universal protein resource (UniProt).通用蛋白质资源(UniProt)。
Nucleic Acids Res. 2008 Jan;36(Database issue):D190-5. doi: 10.1093/nar/gkm895. Epub 2007 Nov 27.
10
FlyMine: an integrated database for Drosophila and Anopheles genomics.FlyMine:果蝇和按蚊基因组学的综合数据库。
Genome Biol. 2007;8(7):R129. doi: 10.1186/gb-2007-8-7-r129.