Motono Chie, Nakata Junichi, Koike Ryotaro, Shimizu Kana, Shirota Matsuyuki, Amemiya Takayuki, Tomii Kentaro, Nagano Nozomi, Sakaya Naofumi, Misoo Kiyotaka, Sato Miwa, Kidera Akinori, Hiroaki Hidekazu, Shirai Tsuyoshi, Kinoshita Kengo, Noguchi Tamotsu, Ota Motonori
Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, Tokyo 135-0064, Japan.
Nucleic Acids Res. 2011 Jan;39(Database issue):D487-93. doi: 10.1093/nar/gkq1057. Epub 2010 Nov 3.
Most proteins from higher organisms are known to be multi-domain proteins and contain substantial numbers of intrinsically disordered (ID) regions. To analyse such protein sequences, those from human for instance, we developed a special protein-structure-prediction pipeline and accumulated the products in the Structure Atlas of Human Genome (SAHG) database at http://bird.cbrc.jp/sahg. With the pipeline, human proteins were examined by local alignment methods (BLAST, PSI-BLAST and Smith-Waterman profile-profile alignment), global-local alignment methods (FORTE) and prediction tools for ID regions (POODLE-S) and homology modeling (MODELLER). Conformational changes of protein models upon ligand-binding were predicted by simultaneous modeling using templates of apo and holo forms. When there were no suitable templates for holo forms and the apo models were accurate, we prepared holo models using prediction methods for ligand-binding (eF-seek) and conformational change (the elastic network model and the linear response theory). Models are displayed as animated images. As of July 2010, SAHG contains 42,581 protein-domain models in approximately 24,900 unique human protein sequences from the RefSeq database. Annotation of models with functional information and links to other databases such as EzCatDB, InterPro or HPRD are also provided to facilitate understanding the protein structure-function relationships.
已知大多数高等生物的蛋白质都是多结构域蛋白,并且含有大量的内在无序(ID)区域。为了分析这类蛋白质序列,例如人类的蛋白质序列,我们开发了一种特殊的蛋白质结构预测流程,并将结果积累到位于http://bird.cbrc.jp/sahg的人类基因组结构图谱(SAHG)数据库中。通过该流程,利用局部比对方法(BLAST、PSI-BLAST和Smith-Waterman profile-profile比对)、全局-局部比对方法(FORTE)以及ID区域预测工具(POODLE-S)和同源建模(MODELLER)对人类蛋白质进行了检测。通过使用无配体和有配体形式的模板进行同步建模,预测了蛋白质模型在配体结合时的构象变化。当没有合适的有配体形式的模板且无配体模型准确时,我们使用配体结合预测方法(eF-seek)和构象变化预测方法(弹性网络模型和线性响应理论)来构建有配体模型。模型以动画图像的形式展示。截至2010年7月,SAHG包含来自RefSeq数据库中约24,900个独特人类蛋白质序列的42,581个蛋白质结构域模型。还提供了带有功能信息的模型注释以及与其他数据库(如EzCatDB、InterPro或HPRD)的链接,以促进对蛋白质结构-功能关系的理解。