Suppr超能文献

使用概率上下文无关语法搜索淀粉样蛋白信号基序的通用模型。

Searching for universal model of amyloid signaling motifs using probabilistic context-free grammars.

机构信息

Wydział Podstawowych Problemów Techniki, Katedra Inżynierii Biomedycznej, Politechnika Wrocławska, Wrocław, Poland.

Wydział Chemiczny, Katedra Chemii Bioorganicznej, Politechnika Wrocławska, Wrocław, Poland.

出版信息

BMC Bioinformatics. 2021 Apr 29;22(1):222. doi: 10.1186/s12859-021-04139-y.

Abstract

BACKGROUND

Amyloid signaling motifs are a class of protein motifs which share basic structural and functional features despite the lack of clear sequence homology. They are hard to detect in large sequence databases either with the alignment-based profile methods (due to short length and diversity) or with generic amyloid- and prion-finding tools (due to insufficient discriminative power). We propose to address the challenge with a machine learning grammatical model capable of generalizing over diverse collections of unaligned yet related motifs.

RESULTS

First, we introduce and test improvements to our probabilistic context-free grammar framework for protein sequences that allow for inferring more sophisticated models achieving high sensitivity at low false positive rates. Then, we infer universal grammars for a collection of recently identified bacterial amyloid signaling motifs and demonstrate that the method is capable of generalizing by successfully searching for related motifs in fungi. The results are compared to available alternative methods. Finally, we conduct spectroscopy and staining analyses of selected peptides to verify their structural and functional relationship.

CONCLUSIONS

While the profile HMMs remain the method of choice for modeling homologous sets of sequences, PCFGs seem more suitable for building meta-family descriptors and extrapolating beyond the seed sample.

摘要

背景

淀粉样信号基序是一类蛋白质基序,尽管缺乏明确的序列同源性,但它们具有基本的结构和功能特征。它们很难用基于比对的轮廓方法(由于长度短且多样性大)或通用的淀粉样蛋白和朊病毒发现工具(由于区分能力不足)在大型序列数据库中检测到。我们建议使用能够概括各种未对齐但相关基序的机器学习语法模型来解决这一挑战。

结果

首先,我们为蛋白质序列引入并测试了对我们的概率上下文无关语法框架的改进,这些改进允许推断更复杂的模型,以实现低假阳性率下的高灵敏度。然后,我们推断了一组最近发现的细菌淀粉样信号基序的通用语法,并证明该方法能够通过在真菌中成功搜索相关基序来实现泛化。将结果与可用的替代方法进行比较。最后,我们对选定的肽进行光谱和染色分析,以验证它们的结构和功能关系。

结论

虽然轮廓 HMM 仍然是建模同源序列集的首选方法,但 PCFG 似乎更适合构建元家族描述符并超越种子样本进行推断。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea23/8086366/7b681560b76f/12859_2021_4139_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验