Graduate School of Information Science, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-0192, Japan.
Biomed Res Int. 2014;2014:753428. doi: 10.1155/2014/753428. Epub 2014 May 11.
Progress in the "omics" fields such as genomics, transcriptomics, proteomics, and metabolomics has engendered a need for innovative analytical techniques to derive meaningful information from the ever increasing molecular data. KNApSAcK motorcycle DB is a popular database for enzymes related to secondary metabolic pathways in plants. One of the challenges in analyses of protein sequence data in such repositories is the standard notation of sequences as strings of alphabetical characters. This has created lack of a natural underlying metric that eases amenability to computation. In view of this requirement, we applied novel integration of selected biochemical and physical attributes of amino acids derived from the amino acid index and quantified in numerical scale, to examine diversity of peptide sequences of terpenoid synthases accumulated in KNApSAcK motorcycle DB. We initially generated a reduced amino acid index table. This is a set of biochemical and physical properties obtained by random forest feature selection of important indices from the amino acid index. Principal component analysis was then applied for characterization of enzymes involved in synthesis of terpenoids. The variance explained was increased by incorporation of residue attributes for analyses.
“组学”领域的进展,如基因组学、转录组学、蛋白质组学和代谢组学,已经产生了对创新分析技术的需求,以便从不断增加的分子数据中获取有意义的信息。KNApSAcK 摩托车数据库是一个与植物次生代谢途径相关酶的流行数据库。在这样的存储库中分析蛋白质序列数据的挑战之一是将序列标准表示为字母字符的字符串。这导致缺乏自然的基础度量标准,从而难以进行计算。有鉴于此,我们应用了从氨基酸指数中得出并以数值形式量化的氨基酸的选定生化和物理属性的新颖整合,以检查 KNApSAcK 摩托车数据库中积累的萜类合成酶的肽序列的多样性。我们最初生成了一个简化的氨基酸指数表。这是一组生化和物理特性,是通过随机森林特征选择从氨基酸指数中获得的重要指数。然后应用主成分分析来表征参与萜类合成的酶。通过纳入残基属性进行分析,解释的方差增加。