Crystallographic Methods, Institute of Molecular Biology of Barcelona (IBMB-CSIC), Baldiri Reixach 15, 08028 Barcelona, Spain.
Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Hills Road, Cambridge CB2 0XY, England.
Acta Crystallogr D Struct Biol. 2020 Mar 1;76(Pt 3):221-237. doi: 10.1107/S2059798320000339. Epub 2020 Feb 25.
Fragment-based molecular-replacement methods can solve a macromolecular structure quasi-ab initio. ARCIMBOLDO, using a common secondary-structure or tertiary-structure template or a library of folds, locates these with Phaser and reveals the rest of the structure by density modification and autotracing in SHELXE. The latter stage is challenging when dealing with diffraction data at lower resolution, low solvent content, high β-sheet composition or situations in which the initial fragments represent a low fraction of the total scattering or where their accuracy is low. SEQUENCE SLIDER aims to overcome these complications by extending the initial polyalanine fragment with side chains in a multisolution framework. Its use is illustrated on test cases and previously unknown structures. The selection and order of fragments to be extended follows the decrease in log-likelihood gain (LLG) calculated with Phaser upon the omission of each single fragment. When the starting substructure is derived from a remote homolog, sequence assignment to fragments is restricted by the original alignment. Otherwise, the secondary-structure prediction is matched to that found in fragments and traces. Sequence hypotheses are trialled in a brute-force approach through side-chain building and refinement. Scoring the refined models through their LLG in Phaser may allow discrimination of the correct sequence or filter the best partial structures for further density modification and autotracing. The default limits for the number of models to pursue are hardware dependent. In its most economic implementation, suitable for a single laptop, the main-chain trace is extended as polyserine rather than trialling models with different sequence assignments, which requires a grid or multicore machine. SEQUENCE SLIDER has been instrumental in solving two novel structures: that of MltC from 2.7 Å resolution data and that of a pneumococcal lipoprotein with 638 residues and 35% solvent content.
基于片段的分子置换方法可以近乎从头解决大分子结构问题。ARCIMBOLDO 利用常见的二级或三级结构模板或折叠文库,通过 Phaser 定位这些模板,并通过密度修正和 SHELXE 中的自动追踪揭示结构的其余部分。当处理低分辨率、低溶剂含量、高 β-折叠成分的衍射数据或初始片段代表总散射的低分数或其准确性较低的情况时,后一阶段具有挑战性。SEQUENCE SLIDER 通过在多解决方案框架中扩展带有侧链的初始多聚丙氨酸片段来克服这些复杂性。它在测试案例和以前未知的结构上进行了说明。要扩展的片段的选择和顺序遵循 Phaser 在省略每个单独片段时计算的对数似然增益 (LLG) 的减小。当起始亚结构源自远程同源物时,片段的序列分配受到原始比对的限制。否则,二级结构预测与在片段和轨迹中找到的预测匹配。通过侧链构建和细化,以暴力方式尝试序列假设。通过在 Phaser 中对精炼模型进行评分,可以对正确的序列进行区分或过滤最佳的部分结构,以进行进一步的密度修正和自动追踪。要追求的模型数量的默认限制取决于硬件。在其最经济的实现中,适用于单个笔记本电脑,主链轨迹扩展为多丝氨酸,而不是尝试具有不同序列分配的模型,这需要网格或多核机器。SEQUENCE SLIDER 在解决两个新结构方面发挥了重要作用:来自 2.7 Å 分辨率数据的 MltC 结构和具有 638 个残基和 35%溶剂含量的肺炎球菌脂蛋白结构。