计算效率高的算法，用于在大型数据集识别匹配分子对 (MMPs)。

Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large data sets.

机构信息

Computational & Structural Chemistry, GlaxoSmithKline, Medicines Research Centre, Gunnels Wood Road, Stevenage, Hertfordshire, U.K.

出版信息

J Chem Inf Model. 2010 Mar 22;50(3):339-48. doi: 10.1021/ci900450m.

DOI:10.1021/ci900450m

PMID:20121045

Abstract

Modern drug discovery organizations generate large volumes of SAR data. A promising methodology that can be used to mine this chemical data to identify novel structure-activity relationships is the matched molecular pair (MMP) methodology. However, before the full potential of the MMP methodology can be utilized, a MMP identification method that is capable of identifying all MMPs in large chemical data sets on modest computational hardware is required. In this paper we report an algorithm that is capable of systematically generating all MMPs in chemical data sets. Additionally, the algorithm is computationally efficient enough to be applied on large data sets. As an example the algorithm was used to identify the MMPs in the approximately 300k NIH MLSMR set. The algorithm identified approximately 5.3 million matched molecular pairs in the set. These pairs cover approximately 2.6 million unique molecular transformations.

摘要

现代药物发现组织会产生大量的 SAR 数据。一种有前途的方法，可以用来挖掘这些化学数据，以确定新的结构-活性关系，是匹配分子对 (MMP) 方法。然而，在充分利用 MMP 方法的潜力之前，需要一种能够在适度的计算硬件上识别大型化学数据集所有 MMP 的 MMP 识别方法。在本文中，我们报告了一种能够系统地生成化学数据集中所有 MMP 的算法。此外，该算法在计算上非常高效，足以应用于大型数据集。作为一个例子，该算法被用于识别大约 30 万个 NIH MLSMR 数据集的 MMP。该算法在该集中识别了大约 530 万个匹配的分子对。这些对涵盖了大约 260 万个独特的分子转变。

相似文献

Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large data sets.计算效率高的算法，用于在大型数据集识别匹配分子对 (MMPs)。

J Chem Inf Model. 2010 Mar 22;50(3):339-48. doi: 10.1021/ci900450m.

From activity cliffs to activity ridges: informative data structures for SAR analysis.从活动崖到活动脊：SAR 分析的信息数据结构。

J Chem Inf Model. 2011 Aug 22;51(8):1848-56. doi: 10.1021/ci2002473. Epub 2011 Aug 4.

Capturing structure-activity relationships from chemogenomic spaces.从化学生物基因组空间中获取结构-活性关系。

J Chem Inf Model. 2011 Apr 25;51(4):843-51. doi: 10.1021/ci100270x. Epub 2011 Mar 16.

A data mining method to facilitate SAR transfer.一种促进 SAR 转移的数据挖掘方法。

J Chem Inf Model. 2011 Aug 22;51(8):1857-66. doi: 10.1021/ci200254k. Epub 2011 Aug 8.

A scalable approach to combinatorial library design for drug discovery.一种用于药物发现的组合文库设计的可扩展方法。

J Chem Inf Model. 2008 Jan;48(1):27-41. doi: 10.1021/ci700023y. Epub 2007 Dec 6.

DISE: directed sphere exclusion.DISE：定向球体排除法

J Chem Inf Comput Sci. 2003 Jan-Feb;43(1):317-23. doi: 10.1021/ci025554v.

WizePairZ: a novel algorithm to identify, encode, and exploit matched molecular pairs with unspecified cores in medicinal chemistry.WizePairZ：一种新颖的算法，可用于识别、编码和利用药物化学中具有未指定核心的匹配分子对。

J Chem Inf Model. 2010 Aug 23;50(8):1350-7. doi: 10.1021/ci100084s.

The ensemble bridge algorithm: a new modeling tool for drug discovery problems.集成桥算法：一种新的药物发现问题建模工具。

J Chem Inf Model. 2010 Feb 22;50(2):309-16. doi: 10.1021/ci9003392.

MMP-Cliffs: systematic identification of activity cliffs on the basis of matched molecular pairs.MMP-Cliffs：基于匹配分子对的活性 cliffs 的系统识别。

J Chem Inf Model. 2012 May 25;52(5):1138-45. doi: 10.1021/ci3001138. Epub 2012 Apr 17.

PARM--an efficient algorithm to mine association rules from spatial data.PARM——一种从空间数据中挖掘关联规则的高效算法。

IEEE Trans Syst Man Cybern B Cybern. 2008 Dec;38(6):1513-24. doi: 10.1109/TSMCB.2008.927730.

引用本文的文献

A Data-Driven Perspective on Bioisostere Evaluation: Mapping the Benzene Bioisostere Landscape with BioSTAR.基于数据驱动的生物电子等排体评估视角：利用BioSTAR描绘苯生物电子等排体格局

J Med Chem. 2025 Aug 28;68(16):16921-16939. doi: 10.1021/acs.jmedchem.5c01641. Epub 2025 Aug 5.

Enhancing Drug-Target Interaction Prediction through Transfer Learning from Activity Cliff Prediction Tasks.通过从活性悬崖预测任务进行迁移学习来增强药物-靶点相互作用预测

J Chem Inf Model. 2025 Jul 14;65(13):6558-6567. doi: 10.1021/acs.jcim.5c00484. Epub 2025 Jun 30.

Context-dependent similarity searching for small molecular fragments.基于上下文的小分子片段相似性搜索

J Cheminform. 2025 May 26;17(1):83. doi: 10.1186/s13321-025-01032-1.

Context-dependent similarity analysis of analogue series for structure-activity relationship transfer based on a concept from natural language processing.基于自然语言处理概念的类似物系列结构-活性关系转移的上下文相关相似性分析

J Cheminform. 2025 Jan 15;17(1):5. doi: 10.1186/s13321-025-00951-3.

PbsNRs: predict the potential binders and scaffolds for nuclear receptors.PbsNRs：预测核受体的潜在结合剂和支架。

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae710.

ADMET evaluation in drug discovery: 21. Application and industrial validation of machine learning algorithms for Caco-2 permeability prediction.药物研发中的ADMET评估：21. 用于Caco-2细胞渗透性预测的机器学习算法的应用与工业验证。

J Cheminform. 2025 Jan 10;17(1):3. doi: 10.1186/s13321-025-00947-z.

Activity Cliff-Informed Contrastive Learning for Molecular Property Prediction.用于分子性质预测的基于活性悬崖的对比学习

Res Sq. 2024 Dec 4:rs.3.rs-2988283. doi: 10.21203/rs.3.rs-2988283/v2.

CLigOpt: controllable ligand design through target-specific optimization.CLigOpt：通过针对特定目标的优化进行可控配体设计。

Bioinformatics. 2024 Sep 1;40(Suppl 2):ii62-ii69. doi: 10.1093/bioinformatics/btae396.

Identification of lysosomotropism using explainable machine learning and morphological profiling cell painting data.利用可解释机器学习和形态学分析细胞成像数据鉴定溶酶体趋向性。

RSC Med Chem. 2024 May 24;15(8):2677-2691. doi: 10.1039/d4md00107a. eCollection 2024 Aug 14.

Predictions of Colloidal Molecular Aggregation Using AI/ML Models.使用人工智能/机器学习模型预测胶体分子聚集

ACS Omega. 2024 Jun 18;9(26):28691-28706. doi: 10.1021/acsomega.4c02886. eCollection 2024 Jul 2.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

计算效率高的算法，用于在大型数据集识别匹配分子对 (MMPs)。

Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large data sets.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献