Carrozzini Benedetta, De Caro Liberato, Giannini Cinzia, Altomare Angela, Caliandro Rocco
Institute of Crystallography, National Research Council of Italy, via Amendola 122/o, Bari, 70126, Italy.
Acta Crystallogr A Found Adv. 2025 May 1;81(Pt 3):188-201. doi: 10.1107/S2053273325002797. Epub 2025 Apr 17.
The overall crystallographic process involves acquiring experimental data and using crystallographic software to find the structure solution. Unfortunately, while diffracted intensities can be measured, the corresponding phases - needed to determine atomic positions - remain experimentally inaccessible (phase problem). Direct methods and the Patterson approach have been successful in solving crystal structures but face limitations with large structures or low-resolution data. Current artificial intelligence (AI) based approaches, such as those recently developed by Larsen et al. [Science (2024), 385, 522-528], have been applied with success to solve centrosymmetric structures, where the phase is binary (0 or π). The current work proposes a new phasing method designed for AI integration, applicable also to non-centrosymmetric structures, where the phase is a continuous variable. The approach involves discretizing the initial phase values for non-centrosymmetric structures into a few distinct values (e.g. values corresponding to the four quadrants). This reduces the complex phase problem from a continuous regression task to a multi-class classification problem, where only a few phase seed values need to be determined. This discretization allows the use of a smaller training dataset for deep learning models, reducing computational complexity. Our feasibility study results show that this method can effectively solve small, medium and large structures, with the minimum percentage of phase seeds (three or four points in the interval [0, 2π]), and 10% to 30% of seed symmetry-independent reflections. This phase-seeding method has the potential to extend AI-based approaches to solve crystal structures ab initio, regardless of complexity or symmetry, by combining AI classification algorithms with classical phasing procedures.
整体晶体学过程包括获取实验数据并使用晶体学软件来寻找结构解决方案。不幸的是,虽然可以测量衍射强度,但确定原子位置所需的相应相位在实验上仍然无法获得(相位问题)。直接法和帕特森方法在解决晶体结构方面取得了成功,但在处理大型结构或低分辨率数据时面临局限性。当前基于人工智能(AI)的方法,例如最近由拉森等人开发的方法[《科学》(2024年),385卷,522 - 528页],已成功应用于解决中心对称结构,其中相位是二元的(0或π)。当前的工作提出了一种为人工智能集成设计的新的相位确定方法,该方法也适用于非中心对称结构,其中相位是一个连续变量。该方法涉及将非中心对称结构的初始相位值离散化为几个不同的值(例如对应于四个象限的值)。这将复杂的相位问题从连续回归任务简化为多类分类问题,其中只需要确定几个相位种子值。这种离散化允许使用较小的训练数据集来训练深度学习模型,从而降低计算复杂度。我们的可行性研究结果表明,该方法可以有效地解决小、中、大型结构,所需的相位种子百分比最低(在区间[0, 2π]内有三个或四个点),且种子对称独立反射占10%至30%。这种相位种子方法有可能通过将人工智能分类算法与经典相位确定程序相结合,扩展基于人工智能的方法来从头解决晶体结构问题,而不论其复杂度或对称性如何。