N 元分析揭示的淀粉样肽生成基序。

Amyloidogenic motifs revealed by n-gram analysis.

机构信息

Department of Genomics, University of Wrocław, Wrocław, Poland.

Faculty of Pure and Applied Mathematics, Wrocław University of Science and Technology, Wrocław, Poland.

出版信息

Sci Rep. 2017 Oct 11;7(1):12961. doi: 10.1038/s41598-017-13210-9.

DOI:10.1038/s41598-017-13210-9

PMID:29021608

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5636826/

Abstract

Amyloids are proteins associated with several clinical disorders, including Alzheimer's, and Creutzfeldt-Jakob's. Despite their diversity, all amyloid proteins can undergo aggregation initiated by short segments called hot spots. To find the patterns defining the hot spots, we trained predictors of amyloidogenicity, using n-grams and random forest classifiers. Since the amyloidogenicity may not depend on the exact sequence of amino acids but on their more general properties, we tested 524,284 reduced amino acid alphabets of different lengths (three to six letters) to find the alphabet providing the best performance in cross-validation. The predictor based on this alphabet, called AmyloGram, was benchmarked against the most popular tools for the detection of amyloid peptides using an external data set and obtained the highest values of performance measures (AUC: 0.90, MCC: 0.63). Our results showed sequential patterns in the amyloids which are strongly correlated with hydrophobicity, a tendency to form β-sheets, and lower flexibility of amino acid residues. Among the most informative n-grams of AmyloGram we identified 15 that were previously confirmed experimentally. AmyloGram is available as the web-server: http://smorfland.uni.wroc.pl/shiny/AmyloGram/ and as the R package AmyloGram. R scripts and data used to produce the results of this manuscript are available at http://github.com/michbur/AmyloGramAnalysis .

摘要

淀粉样蛋白与多种临床疾病有关，包括阿尔茨海默病和克雅氏病。尽管它们具有多样性，但所有淀粉样蛋白都可以通过称为热点的短片段引发聚集。为了找到定义热点的模式，我们使用 n-gram 和随机森林分类器训练了淀粉样蛋白原性的预测器。由于淀粉样蛋白原性可能不取决于氨基酸的精确序列，而是取决于它们更一般的性质，我们测试了 524,284 个不同长度（三到六个字母）的简化氨基酸字母，以找到在交叉验证中表现最佳的字母。基于此字母的预测器称为 AmyloGram，使用外部数据集对用于检测淀粉样肽的最流行工具进行了基准测试，并获得了性能指标（AUC：0.90，MCC：0.63）的最高值。我们的结果显示，淀粉样蛋白中的序列模式与疏水性、形成β-折叠的趋势以及氨基酸残基的柔韧性较低密切相关。在 AmyloGram 的最具信息量的 n-gram 中，我们确定了 15 个以前在实验中得到证实的 n-gram。AmyloGram 可作为网络服务器：http://smorfland.uni.wroc.pl/shiny/AmyloGram/，也可作为 R 包 AmyloGram。用于生成本文献结果的 R 脚本和数据可在 http://github.com/michbur/AmyloGramAnalysis 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bbe4/5636826/c9efdbfb7912/41598_2017_13210_Fig1_HTML.jpg

相似文献

Amyloidogenic motifs revealed by n-gram analysis.N 元分析揭示的淀粉样肽生成基序。

Sci Rep. 2017 Oct 11;7(1):12961. doi: 10.1038/s41598-017-13210-9.

FISH Amyloid - a new method for finding amyloidogenic segments in proteins based on site specific co-occurrence of aminoacids.FISH 淀粉样变——一种基于氨基酸特定共现的发现蛋白质中淀粉样肽段的新方法。

BMC Bioinformatics. 2014 Feb 24;15:54. doi: 10.1186/1471-2105-15-54.

Bioinformatics methods for identification of amyloidogenic peptides show robustness to misannotated training data.生物信息学方法可用于鉴定淀粉样肽，这些方法对错误标注的训练数据具有稳健性。

Sci Rep. 2021 Apr 26;11(1):8934. doi: 10.1038/s41598-021-86530-6.

Breaking the amyloidogenicity code: methods to predict amyloids from amino acid sequence.破解淀粉样变性密码：从氨基酸序列预测淀粉样蛋白的方法。

FEBS Lett. 2013 Apr 17;587(8):1089-95. doi: 10.1016/j.febslet.2012.12.006. Epub 2012 Dec 20.

Machine learning study of classifiers trained with biophysiochemical properties of amino acids to predict fibril forming Peptide motifs.利用氨基酸的生物物理化学性质训练分类器以预测纤维形成肽基序的机器学习研究。

Protein Pept Lett. 2012 Sep;19(9):917-23. doi: 10.2174/092986612802084429.

ReRF-Pred: predicting amyloidogenic regions of proteins based on their pseudo amino acid composition and tripeptide composition.ReRF-Pred：基于蛋白质的伪氨基酸组成和三肽组成预测蛋白质的淀粉样纤维形成区域。

BMC Bioinformatics. 2021 Nov 9;22(1):545. doi: 10.1186/s12859-021-04446-4.

Is it possible to predict amyloidogenic regions from sequence alone?仅从序列就能预测淀粉样蛋白生成区域吗？

J Bioinform Comput Biol. 2006 Apr;4(2):373-88. doi: 10.1142/s0219720006002004.

ANuPP: A Versatile Tool to Predict Aggregation Nucleating Regions in Peptides and Proteins.ANuPP：一种用于预测肽和蛋白质中聚集核区域的多功能工具。

J Mol Biol. 2021 May 28;433(11):166707. doi: 10.1016/j.jmb.2020.11.006. Epub 2020 Nov 12.

Pathologic polyglutamine aggregation begins with a self-poisoning polymer crystal.病理性聚谷氨酰胺聚集始于自毒聚合物晶体。

Elife. 2023 Nov 3;12:RP86939. doi: 10.7554/eLife.86939.

PACT - Prediction of amyloid cross-interaction by threading.通过穿线预测淀粉样蛋白的交叉相互作用。

Sci Rep. 2023 Dec 14;13(1):22268. doi: 10.1038/s41598-023-48886-9.

引用本文的文献

Comprehensive protein datasets and benchmarking for liquid-liquid phase separation studies.用于液-液相分离研究的综合蛋白质数据集及基准测试

Genome Biol. 2025 Jul 8;26(1):198. doi: 10.1186/s13059-025-03668-6.

A conserved motif in Henipavirus P/V/W proteins drives the fibrillation of the W protein from Hendra virus.亨尼帕病毒P/V/W蛋白中的一个保守基序驱动了亨德拉病毒W蛋白的纤维化。

Protein Sci. 2025 Apr;34(4):e70085. doi: 10.1002/pro.70085.

The Proteome Content of Blood Clots Observed Under Different Conditions: Successful Role in Predicting Clot Amyloid(ogenicity).不同条件下观察到的血凝块蛋白质组含量：在预测血凝块淀粉样变性方面的成功作用。

Molecules. 2025 Feb 3;30(3):668. doi: 10.3390/molecules30030668.

Developing machine-learning-based amyloidogenicity predictors with Cross-Beta DB.利用交叉β数据库开发基于机器学习的淀粉样蛋白生成预测器。

Alzheimers Dement. 2025 Feb;21(2):e14510. doi: 10.1002/alz.14510. Epub 2025 Jan 8.

Prediction and Evaluation of Protein Aggregation with Computational Methods.运用计算方法预测和评估蛋白质聚集

Methods Mol Biol. 2025;2867:299-314. doi: 10.1007/978-1-0716-4196-5_17.

Proteomic Evidence for Amyloidogenic Cross-Seeding in Fibrinaloid Microclots.纤维蛋白原样微栓中淀粉样蛋白形成的蛋白组学证据

Int J Mol Sci. 2024 Oct 8;25(19):10809. doi: 10.3390/ijms251910809.

A novel alpha-synuclein G14R missense variant is associated with atypical neuropathological features.一种新型的α-突触核蛋白G14R错义变体与非典型神经病理学特征相关。

medRxiv. 2025 Feb 7:2024.09.23.24313864. doi: 10.1101/2024.09.23.24313864.

MicroRNA biomarkers as next-generation diagnostic tools for neurodegenerative diseases: a comprehensive review.微小RNA生物标志物作为神经退行性疾病的下一代诊断工具：综述

Front Mol Neurosci. 2024 May 31;17:1386735. doi: 10.3389/fnmol.2024.1386735. eCollection 2024.

Photocontrolled Reversible Amyloid Fibril Formation of Parathyroid Hormone-Derived Peptides.甲状旁腺激素衍生肽的光控可逆淀粉样纤维形成

Bioconjug Chem. 2024 Jul 17;35(7):981-995. doi: 10.1021/acs.bioconjchem.4c00188. Epub 2024 Jun 12.

AggreProt: a web server for predicting and engineering aggregation prone regions in proteins.AggreProt：一个用于预测和设计蛋白质中易于聚集区域的网络服务器。

Nucleic Acids Res. 2024 Jul 5;52(W1):W159-W169. doi: 10.1093/nar/gkae420.

本文引用的文献

Characterization of Amyloid Cores in Prion Domains.朊病毒结构域中淀粉样蛋白核心的表征

Sci Rep. 2016 Sep 30;6:34274. doi: 10.1038/srep34274.

One of the possible mechanisms of amyloid fibrils formation based on the sizes of primary and secondary folding nuclei of Aβ40 and Aβ42.基于Aβ40和Aβ42一级和二级折叠核大小的淀粉样纤维形成的可能机制之一。

J Struct Biol. 2016 Jun;194(3):404-14. doi: 10.1016/j.jsb.2016.03.020. Epub 2016 Mar 22.

Prediction of Peptide and Protein Propensity for Amyloid Formation.肽和蛋白质形成淀粉样蛋白倾向的预测。

PLoS One. 2015 Aug 4;10(8):e0134679. doi: 10.1371/journal.pone.0134679. eCollection 2015.

AmyLoad: website dedicated to amyloidogenic protein fragments.AmyLoad：专门针对淀粉样蛋白片段的网站。

Bioinformatics. 2015 Oct 15;31(20):3395-7. doi: 10.1093/bioinformatics/btv375. Epub 2015 Jun 17.

AGGRESCAN3D (A3D): server for prediction of aggregation properties of protein structures.AGGRESCAN3D（A3D）：用于预测蛋白质结构聚集特性的服务器。

Nucleic Acids Res. 2015 Jul 1;43(W1):W306-13. doi: 10.1093/nar/gkv359. Epub 2015 Apr 16.

WALTZ-DB: a benchmark database of amyloidogenic hexapeptides.WALTZ-DB：淀粉样肽的基准数据库。

Bioinformatics. 2015 May 15;31(10):1698-700. doi: 10.1093/bioinformatics/btv027. Epub 2015 Jan 18.

Characteristics of protein residue-residue contacts and their application in contact prediction.蛋白质残基-残基接触的特征及其在接触预测中的应用。

J Mol Model. 2014 Nov;20(11):2497. doi: 10.1007/s00894-014-2497-9. Epub 2014 Nov 6.

PASTA 2.0: an improved server for protein aggregation prediction.PASTA 2.0：改进的蛋白质聚集预测服务器。

Nucleic Acids Res. 2014 Jul;42(Web Server issue):W301-7. doi: 10.1093/nar/gku399. Epub 2014 May 21.

BMC Bioinformatics. 2014 Feb 24;15:54. doi: 10.1186/1471-2105-15-54.

On the amyloid datasets used for training PAFIG--how (not) to extend the experimental dataset of hexapeptides.用于训练 PAFIG 的淀粉样蛋白数据集——如何（不）扩展六肽的实验数据集。

BMC Bioinformatics. 2013 Dec 4;14:351. doi: 10.1186/1471-2105-14-351.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

N 元分析揭示的淀粉样肽生成基序。

Amyloidogenic motifs revealed by n-gram analysis.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献