PIT Bioinformatics Group Eötvös University Budapest Hungary.
Uratim Ltd. Budapest Hungary.
FEBS Open Bio. 2018 Nov 22;9(1):185-190. doi: 10.1002/2211-5463.12524. eCollection 2019 Jan.
The Protein Data Bank (PDB) contains more than 135 000 entries at present. From these, relatively few amyloid structures can be identified, since amyloids are insoluble in water. Therefore, most amyloid structures deposited in the PDB are in the form of solid state NMR data. Based on the geometric analysis of these deposited structures, we have prepared an automatically updated web server, which generates a list of the deposited amyloid structures, and also entries of globular proteins that have amyloid-like substructures of given size and characteristics. We have found that by applying only appropriately selected geometric conditions, it is possible to identify deposited amyloid structures and a number of globular proteins with amyloid-like substructures. We have analyzed these globular proteins and have found proof in the literature that many of them form amyloids more easily than many other globular proteins. Our results relate to the method of Stanković . [Stanković I . (2017) IPSI BgD Tran Int Res 13, 47-51], who applied a hybrid textual-search and geometric approach for finding amyloids in the PDB. If one intends to identify a subset of the PDB for certain applications, the identification algorithm needs to be re-run periodically, since in 2017 on average 30 new entries per day were deposited in the data bank. Our web server is updated regularly and automatically, and the identified amyloid and partial amyloid structures can be viewed or their list can be downloaded from the following website https://pitgroup.org/amyloid.
目前,蛋白质数据库 (PDB) 中包含超过 135000 个条目。在这些条目当中,能够被识别为淀粉样蛋白的结构相对较少,因为淀粉样蛋白在水中不溶解。因此,PDB 中储存的大多数淀粉样蛋白结构都是固态 NMR 数据的形式。基于对这些已储存结构的几何分析,我们准备了一个自动更新的网络服务器,该服务器会生成一个已储存淀粉样蛋白结构列表,以及具有特定大小和特征的淀粉样蛋白样亚结构的球状蛋白条目。我们发现,仅通过应用适当选择的几何条件,就有可能识别已储存的淀粉样蛋白结构和一些具有淀粉样蛋白样亚结构的球状蛋白。我们已经分析了这些球状蛋白,并在文献中找到了证据,证明其中许多比许多其他球状蛋白更容易形成淀粉样蛋白。我们的结果与 Stanković 的方法有关。[Stanković I. (2017) IPSI BgD Tran Int Res 13, 47-51],他应用了一种混合文本搜索和几何方法在 PDB 中寻找淀粉样蛋白。如果打算针对特定应用识别 PDB 的子集,则需要定期重新运行识别算法,因为在 2017 年,平均每天有 30 个新条目被添加到数据库中。我们的网络服务器会定期自动更新,并且可以从以下网站查看已识别的淀粉样蛋白和部分淀粉样蛋白结构,或者下载它们的列表:https://pitgroup.org/amyloid。