Department of Genome Informatics, Genome Information Research Center, Research Institute for Microbial Diseases, Osaka University, 3-1 Yamadaoka, Suita 565-0871, Japan.
Systems Immunology Laboratory, Immunology Frontier Research Center, Osaka University, 3-1 Yamadaoka, Suita 565-0871, Japan.
Nucleic Acids Res. 2019 Jul 2;47(W1):W5-W10. doi: 10.1093/nar/gkz342.
Here, we describe a web server that integrates structural alignments with the MAFFT multiple sequence alignment (MSA) tool. For this purpose, we have prepared a web-based Database of Aligned Structural Homologs (DASH), which provides structural alignments at the domain and chain levels for all proteins in the Protein Data Bank (PDB), and can be queried interactively or by a simple REST-like API. MAFFT-DASH integration can be invoked with a single flag on either the web (https://mafft.cbrc.jp/alignment/server/) or command-line versions of MAFFT. In our benchmarks using 878 cases from the BAliBase, HomFam, OXFam, Mattbench and SISYPHUS datasets, MAFFT-DASH showed 10-20% improvement over standard MAFFT for MSA problems with weak similarity, in terms of Sum-of-Pairs (SP), a measure of how well a program succeeds at aligning input sequences in comparison to a reference alignment. When MAFFT alignments were supplemented with homologous sequences, further improvement was observed. Potential applications of DASH beyond MSA enrichment include functional annotation through detection of remote homology and assembly of template libraries for homology modeling.
在这里,我们描述了一个集成结构比对与 MAFFT 多序列比对(MSA)工具的网络服务器。为此,我们准备了一个基于网络的数据库,即对齐结构同源物数据库(DASH),它为蛋白质数据库(PDB)中的所有蛋白质提供了结构域和链水平的结构比对,并且可以通过交互式或简单的类似于 REST 的 API 进行查询。在 MAFFT 的网络(https://mafft.cbrc.jp/alignment/server/)或命令行版本上,只需一个标志即可调用 MAFFT-DASH 集成。在使用 BAliBase、HomFam、OXFam、Mattbench 和 SISYPHUS 数据集的 878 个案例的基准测试中,在弱相似性的 MSA 问题方面,与标准 MAFFT 相比,MAFFT-DASH 在对序列的比对效果方面(SP),即程序在与参考比对相比成功对齐输入序列的程度上,提高了 10-20%。当用同源序列补充 MAFFT 比对时,观察到了进一步的改进。DASH 的潜在应用不仅限于 MSA 富集,还包括通过检测远程同源性进行功能注释,以及为同源建模组装模板库。