MD-LAIs软件：计算肽和蛋白质的全序列及氨基酸水平的“嵌入”

MD-LAIs Software: Computing Whole-Sequence and Amino Acid-Level "Embeddings" for Peptides and Proteins.

作者信息

Contreras-Torres Ernesto, Marrero-Ponce Yovani

机构信息

Norewian Cruise Line Holdings Limited, Corporate Center Drive, Miami, Florida 33216, United States.

Facultad de Ingeniería, Universidad Panamericana, Augusto Rodin No. 498, Insurgentes Mixcoac, Benito Juárez, Ciudad de México 03920, México.

出版信息

J Chem Inf Model. 2024 Dec 9;64(23):8665-8672. doi: 10.1021/acs.jcim.3c01189. Epub 2024 Nov 18.

DOI:10.1021/acs.jcim.3c01189

PMID:39552512

Abstract

Several computational tools have been developed to calculate sequence-based molecular descriptors (MDs) for peptides and proteins. However, these tools have certain limitations: 1) They generally lack capabilities for curating input data. 2) Their outputs often exhibit significant overlap. 3) There is limited availability of MDs at the amino acid () level. 4) They lack flexibility in computing specific MDs. To address these issues, we developed (olecular escriptors from ocal mino acid nvariant), Java-based software designed to compute both whole-sequence and -level MDs for peptides and proteins. These MDs are generated by applying aggregation operators () to macromolecular vectors containing the chemical-physical and structural properties of . The set of includes both nonclassical (e.g., Minkowski norms) and classical (e.g., Radial Distribution Function). Classical capture neighborhood structural information at different levels, while nonclassical are applied using a sliding window to generalize the -level output. A weighting system based on fuzzy membership functions is also included to account for the contributions of individual . features: 1) a module for data curation tasks, 2) a feature selection module, 3) projects of highly relevant MDs, and 4) low-dimensional lists of informative global and -level MDs. Overall, we expect that will be a valuable tool for encoding protein or peptide sequences. The software is freely available as a stand-alone system on GitHub (https://github.com/Grupo-Medicina-Molecular-y-Traslacional/MD_LAIS).

摘要

已经开发了几种计算工具来计算肽和蛋白质基于序列的分子描述符（MDs）。然而，这些工具存在一定的局限性：1）它们通常缺乏整理输入数据的能力。2）它们的输出常常表现出显著的重叠。3）氨基酸（）水平的MDs可用性有限。4）它们在计算特定MDs时缺乏灵活性。为了解决这些问题，我们开发了（来自局部氨基酸变体的分子描述符），这是一个基于Java的软件，旨在计算肽和蛋白质的全序列和水平MDs。这些MDs是通过将聚合算子（）应用于包含的化学物理和结构特性的大分子向量而生成的。的集合包括非经典的（例如，闵可夫斯基范数）和经典的（例如，径向分布函数）。经典的在不同水平捕获邻域结构信息，而非经典的则使用滑动窗口来概括水平输出。还包括一个基于模糊隶属函数的加权系统，以考虑个体的贡献。功能：1）数据整理任务模块，2）特征选择模块，3）高度相关MDs的项目，以及4）信息丰富的全局和水平MDs的低维列表。总体而言，我们期望将成为编码蛋白质或肽序列的有价值工具。该软件可在GitHub（https://github.com/Grupo-Medicina-Molecular-y-Traslacional/MD_LAIS）上作为独立系统免费获得。

相似文献

MD-LAIs Software: Computing Whole-Sequence and Amino Acid-Level "Embeddings" for Peptides and Proteins.

J Chem Inf Model. 2024 Dec 9;64(23):8665-8672. doi: 10.1021/acs.jcim.3c01189. Epub 2024 Nov 18.

When global and local molecular descriptors are more than the sum of its parts: Simple, But Not Simpler?

Mol Divers. 2020 Nov;24(4):913-932. doi: 10.1007/s11030-019-10002-3. Epub 2019 Oct 28.

Distributed and multicore QuBiLS-MIDAS software v2.0: Computing chiral, fuzzy, weighted and truncated geometrical molecular descriptors based on tensor algebra.

J Comput Chem. 2020 May 5;41(12):1209-1227. doi: 10.1002/jcc.26167. Epub 2020 Feb 14.

: A Novel Multiplatform Framework to Compute Tensor Algebra-Based Three-Dimensional Protein Descriptors.

J Chem Inf Model. 2020 Feb 24;60(2):1042-1059. doi: 10.1021/acs.jcim.9b00629. Epub 2019 Oct 30.

QuBiLS-MAS, open source multi-platform software for atom- and bond-based topological (2D) and chiral (2.5D) algebraic molecular descriptors computations.

J Cheminform. 2017 Jun 7;9(1):35. doi: 10.1186/s13321-017-0211-5.

RaaMLab: A MATLAB toolbox that generates amino acid groups and reduced amino acid modes.

Biosystems. 2019 Jun;180:38-45. doi: 10.1016/j.biosystems.2019.03.002. Epub 2019 Mar 21.

CaLMPhosKAN: prediction of general phosphorylation sites in proteins via fusion of codon aware embeddings with amino acid aware embeddings and wavelet-based Kolmogorov-Arnold network.

Bioinformatics. 2025 Mar 29;41(4). doi: 10.1093/bioinformatics/btaf124.

Choquet integral-based fuzzy molecular characterizations: when global definitions are computed from the dependency among atom/bond contributions (LOVIs/LOEIs).

J Cheminform. 2018 Oct 25;10(1):51. doi: 10.1186/s13321-018-0306-7.

Update of PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence.

Nucleic Acids Res. 2011 Jul;39(Web Server issue):W385-90. doi: 10.1093/nar/gkr284. Epub 2011 May 23.

propy: a tool to generate various modes of Chou's PseAAC.

Bioinformatics. 2013 Apr 1;29(7):960-2. doi: 10.1093/bioinformatics/btt072. Epub 2013 Feb 19.

引用本文的文献

Optimal Descriptor Subset Search via Chemical Information and Target Activity-Guided Algorithm for Antimicrobial Peptide Prediction.

J Chem Inf Model. 2025 Jul 14;65(13):6621-6631. doi: 10.1021/acs.jcim.5c00600. Epub 2025 Jun 18.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

MD-LAIs软件：计算肽和蛋白质的全序列及氨基酸水平的“嵌入”

MD-LAIs Software: Computing Whole-Sequence and Amino Acid-Level "Embeddings" for Peptides and Proteins.

作者信息

Contreras-Torres Ernesto, Marrero-Ponce Yovani

机构信息

Norewian Cruise Line Holdings Limited, Corporate Center Drive, Miami, Florida 33216, United States.

Facultad de Ingeniería, Universidad Panamericana, Augusto Rodin No. 498, Insurgentes Mixcoac, Benito Juárez, Ciudad de México 03920, México.

出版信息

J Chem Inf Model. 2024 Dec 9;64(23):8665-8672. doi: 10.1021/acs.jcim.3c01189. Epub 2024 Nov 18.

DOI:10.1021/acs.jcim.3c01189

PMID:39552512

Abstract

摘要

MD-LAIs软件：计算肽和蛋白质的全序列及氨基酸水平的“嵌入”

MD-LAIs Software: Computing Whole-Sequence and Amino Acid-Level "Embeddings" for Peptides and Proteins.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

MD-LAIs软件：计算肽和蛋白质的全序列及氨基酸水平的“嵌入”

MD-LAIs Software: Computing Whole-Sequence and Amino Acid-Level "Embeddings" for Peptides and Proteins.

作者信息

机构信息

出版信息

相似文献

引用本文的文献