Suppr超能文献

THE-DB:用于大肠杆菌 K12 和人类蛋白质组比较蛋白质结构分析的一个串扰模型数据库。

THE-DB: a threading model database for comparative protein structure analysis of the E. coli K12 and human proteomes.

机构信息

Department of Computational Medicine and Bioinformatics, University of Michigan, Washtenaw Avenue, Ann Arbor, MI, USA.

Department of Bioinformatics, Boston University, Cummington Mall, Boston, MA, USA.

出版信息

Database (Oxford). 2018 Jan 1;2018:bay090. doi: 10.1093/database/bay090.

Abstract

New methodology must be developed to improve the ability to characterize the growing number of amino acid sequences, which vastly exceeds the number of experimentally determined protein structures. Homologous proteins can be used as structural templates for modeling proteins that do not have experimentally determined structures. However, in many cases, there are no homologous proteins (typically <30% sequence identity) with determined structures from which a query sequence can be reliably modeled. The aim of protein threading is to use features, such as secondary structure, solvent accessibility and torsional angles, in addition to sequence patterns to identify structural templates from the protein databank to assist for full-length atomic-level structural modeling. However, there are still numerous protein sequences for which correct templates cannot be recognized. This raises the question as to what attributes allow query sequences to be matched to the correct but distantly homologous templates. To aid the investigation into this question and to provide genome-score protein structure for the biological community, a database called THE-DB (threading hard and easy protein database) has been developed in which it becomes possible to analyze over 15 000 query sequences from the Escherichia coli (E. coli) K12 and human proteomes, as well as to find their three-dimensional templates derived from the state-of-the-art threading algorithms which is not feasible with existing protein template databases. The E. coli K12 and human data can be downloaded in bulk from the THE-DB page.

摘要

必须开发新的方法学来提高对越来越多的氨基酸序列进行特征描述的能力,这些序列的数量远远超过了实验确定的蛋白质结构的数量。同源蛋白质可以用作建模没有实验确定结构的蛋白质的结构模板。然而,在许多情况下,没有具有确定结构的同源蛋白质(通常 <30%序列同一性)可以从中可靠地对查询序列进行建模。蛋白质穿线的目的是使用特征,例如二级结构、溶剂可及性和扭转角,以及序列模式,从蛋白质数据库中识别结构模板,以协助全长原子水平结构建模。然而,仍然有许多蛋白质序列无法识别正确的模板。这就提出了一个问题,即哪些属性允许查询序列与正确但远同源的模板匹配。为了帮助研究这个问题,并为生物界提供基因组评分的蛋白质结构,开发了一个名为 THE-DB(穿线难和易蛋白质数据库)的数据库,通过该数据库可以分析来自大肠杆菌(E. coli)K12 和人类蛋白质组的超过 15000 个查询序列,并找到它们的三维模板,这是现有蛋白质模板数据库所无法实现的。大肠杆菌 K12 和人类数据可以从 THE-DB 页面批量下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cdbb/6146127/667cbb2b0e9c/bay090f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验