Contreras-Torres Ernesto, Marrero-Ponce Yovani
Norewian Cruise Line Holdings Limited, Corporate Center Drive, Miami, Florida 33216, United States.
Facultad de Ingeniería, Universidad Panamericana, Augusto Rodin No. 498, Insurgentes Mixcoac, Benito Juárez, Ciudad de México 03920, México.
J Chem Inf Model. 2024 Dec 9;64(23):8665-8672. doi: 10.1021/acs.jcim.3c01189. Epub 2024 Nov 18.
Several computational tools have been developed to calculate sequence-based molecular descriptors (MDs) for peptides and proteins. However, these tools have certain limitations: 1) They generally lack capabilities for curating input data. 2) Their outputs often exhibit significant overlap. 3) There is limited availability of MDs at the amino acid () level. 4) They lack flexibility in computing specific MDs. To address these issues, we developed (olecular escriptors from ocal mino acid nvariant), Java-based software designed to compute both whole-sequence and -level MDs for peptides and proteins. These MDs are generated by applying aggregation operators () to macromolecular vectors containing the chemical-physical and structural properties of . The set of includes both nonclassical (e.g., Minkowski norms) and classical (e.g., Radial Distribution Function). Classical capture neighborhood structural information at different levels, while nonclassical are applied using a sliding window to generalize the -level output. A weighting system based on fuzzy membership functions is also included to account for the contributions of individual . features: 1) a module for data curation tasks, 2) a feature selection module, 3) projects of highly relevant MDs, and 4) low-dimensional lists of informative global and -level MDs. Overall, we expect that will be a valuable tool for encoding protein or peptide sequences. The software is freely available as a stand-alone system on GitHub (https://github.com/Grupo-Medicina-Molecular-y-Traslacional/MD_LAIS).
已经开发了几种计算工具来计算肽和蛋白质基于序列的分子描述符(MDs)。然而,这些工具存在一定的局限性:1)它们通常缺乏整理输入数据的能力。2)它们的输出常常表现出显著的重叠。3)氨基酸()水平的MDs可用性有限。4)它们在计算特定MDs时缺乏灵活性。为了解决这些问题,我们开发了(来自局部氨基酸变体的分子描述符),这是一个基于Java的软件,旨在计算肽和蛋白质的全序列和水平MDs。这些MDs是通过将聚合算子()应用于包含的化学物理和结构特性的大分子向量而生成的。的集合包括非经典的(例如,闵可夫斯基范数)和经典的(例如,径向分布函数)。经典的在不同水平捕获邻域结构信息,而非经典的则使用滑动窗口来概括水平输出。还包括一个基于模糊隶属函数的加权系统,以考虑个体的贡献。 功能:1)数据整理任务模块,2)特征选择模块,3)高度相关MDs的项目,以及4)信息丰富的全局和水平MDs的低维列表。总体而言,我们期望将成为编码蛋白质或肽序列的有价值工具。该软件可在GitHub(https://github.com/Grupo-Medicina-Molecular-y-Traslacional/MD_LAIS)上作为独立系统免费获得。