Institute of Biotechnology of the Czech Academy of Sciences, BIOCEV, CZ-252 50 Vestec, Prague-West, Czech Republic.
Nucleic Acids Res. 2020 Jun 19;48(11):6367-6381. doi: 10.1093/nar/gkaa383.
By analyzing almost 120 000 dinucleotides in over 2000 nonredundant nucleic acid crystal structures, we define 96+1 diNucleotide Conformers, NtCs, which describe the geometry of RNA and DNA dinucleotides. NtC classes are grouped into 15 codes of the structural alphabet CANA (Conformational Alphabet of Nucleic Acids) to simplify symbolic annotation of the prominent structural features of NAs and their intuitive graphical display. The search for nontrivial patterns of NtCs resulted in the identification of several types of RNA loops, some of them observed for the first time. Over 30% of the nearly six million dinucleotides in the PDB cannot be assigned to any NtC class but we demonstrate that up to a half of them can be re-refined with the help of proper refinement targets. A statistical analysis of the preferences of NtCs and CANA codes for the 16 dinucleotide sequences showed that neither the NtC class AA00, which forms the scaffold of RNA structures, nor BB00, the DNA most populated class, are sequence neutral but their distributions are significantly biased. The reported automated assignment of the NtC classes and CANA codes available at dnatco.org provides a powerful tool for unbiased analysis of nucleic acid structures by structural and molecular biologists.
通过分析超过 2000 个非冗余核酸晶体结构中的近 120000 个二核苷酸,我们定义了 96+1 个二核苷酸构象体(NtC),这些构象体描述了 RNA 和 DNA 二核苷酸的几何形状。NtC 类被分为 15 个 CANA(核酸结构字母表)代码,以简化对 NAs 显著结构特征的符号注释及其直观的图形显示。对 NtC 模式的非平凡搜索导致了几种 RNA 环的识别,其中一些是首次观察到的。在 PDB 中近六百万个二核苷酸中,超过 30%的二核苷酸不能被分配到任何 NtC 类,但我们证明,其中多达一半可以通过适当的精修目标重新精修。对 NtC 和 CANA 代码对 16 个二核苷酸序列偏好的统计分析表明,形成 RNA 结构支架的 NtC 类 AA00 和最常见的 DNA 类 BB00 都不是序列中性的,而是它们的分布存在显著的偏向性。在 dnatco.org 上提供的自动分配 NtC 类和 CANA 代码的功能为结构和分子生物学家对核酸结构进行无偏分析提供了强大的工具。