Milchevskiy Yury V, Kravatskaya Galina I, Kravatsky Yury V
Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilov Str., 32, 119991 Moscow, Russia.
Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilov Str., 32, 119991 Moscow, Russia.
Int J Mol Sci. 2024 Nov 22;25(23):12555. doi: 10.3390/ijms252312555.
The physicochemical properties of amino acid residues from the AAindex database are widely used as predictors in building models for predicting both protein structures and properties. It should be noted, however, that the AAindex database contains data only for the 20 canonical amino acids. Non-canonical amino acids, while less common, are not rare; the Protein Data Bank includes proteins with more than 1000 distinct non-canonical amino acids. In this study, we propose a method to evaluate the physicochemical properties from the AAindex database for non-canonical amino acids and assess the prediction quality. We implemented our method as a bioinformatics tool and estimated the physicochemical properties of non-canonical amino acids from the PDB with the chemical composition presentation using SMILES encoding obtained from the PDBechem databank. The bioinformatics tool and resulting database of the estimated properties are freely available on the author's website and available for download via GitHub.
来自AAindex数据库的氨基酸残基的物理化学性质被广泛用作构建预测蛋白质结构和性质模型的预测因子。然而,应该注意的是,AAindex数据库仅包含20种标准氨基酸的数据。非标准氨基酸虽然不太常见,但并不罕见;蛋白质数据库中包含具有1000多种不同非标准氨基酸的蛋白质。在本研究中,我们提出了一种方法来评估来自AAindex数据库的非标准氨基酸的物理化学性质,并评估预测质量。我们将我们的方法实现为一种生物信息学工具,并使用从PDBechem数据库获得的SMILES编码,通过化学成分表示法估计了PDB中非标准氨基酸的物理化学性质。该生物信息学工具和由此产生的估计性质数据库可在作者网站上免费获取,并可通过GitHub下载。