PIT Bioinformatics Group, Eötvös University, Budapest H-1117, Hungary.
Uratim Ltd., Budapest H-1118, Hungary.
J Integr Bioinform. 2021 Jul 26;19(1):20200043. doi: 10.1515/jib-2020-0043.
The Protein Data Bank (PDB) today contains more than 174,000 entries with the 3-dimensional structures of biological macromolecules. Using the rich resources of this repository, it is possible identifying subsets with specific, interesting properties for different applications. Our research group prepared an automatically updated list of amyloid- and probably amyloidogenic molecules, the PDB_Amyloid collection, which is freely available at the address http://pitgroup.org/amyloid. This resource applies exclusively the geometric properties of the steric structures for identifying amyloids. In the present contribution, we analyze the starting (i.e., prefix) subsequences of the characteristic, parallel beta-sheets of the structures in the PDB_Amyloid collection, and identify further appearances of these length-5 prefix subsequences in the whole PDB data set. We have identified this way numerous proteins, whose normal or irregular functions involve amyloid formation, structural misfolding, or anti-coagulant properties, simply by containing these prefixes: including the T-cell receptor (TCR), bound with the major histocompatibility complexes MHC-1 and MHC-2; the p53 tumor suppressor protein; a mycobacterial RNA polymerase transcription initialization complex; the human bridging integrator protein BIN-1; and the tick anti-coagulant peptide TAP.
蛋白质数据库 (PDB) 目前包含超过 174000 个生物大分子的三维结构条目。利用这个存储库的丰富资源,可以针对不同的应用识别具有特定、有趣属性的子集。我们的研究小组准备了一个自动更新的淀粉样蛋白和可能的淀粉样蛋白原分子列表,即 PDB_Amyloid 集合,可在地址 http://pitgroup.org/amyloid 上免费获取。该资源仅应用空间结构的几何性质来识别淀粉样蛋白。在本研究中,我们分析了 PDB_Amyloid 集合中结构特征性平行β-折叠的起始(即前缀)子序列,并在整个 PDB 数据集识别出这些 5 个字母长的前缀子序列的进一步出现。我们通过这种方式识别出了许多蛋白质,它们的正常或异常功能涉及淀粉样蛋白形成、结构错误折叠或抗凝血特性,仅仅是因为它们包含这些前缀:包括 T 细胞受体 (TCR),与主要组织相容性复合物 MHC-1 和 MHC-2 结合;p53 肿瘤抑制蛋白;一种分枝杆菌 RNA 聚合酶转录起始复合物;人类桥连整合蛋白 BIN-1;和蜱抗凝血肽 TAP。