Suppr超能文献

计算效率高的算法,用于在大型数据集识别匹配分子对 (MMPs)。

Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large data sets.

机构信息

Computational & Structural Chemistry, GlaxoSmithKline, Medicines Research Centre, Gunnels Wood Road, Stevenage, Hertfordshire, U.K.

出版信息

J Chem Inf Model. 2010 Mar 22;50(3):339-48. doi: 10.1021/ci900450m.

Abstract

Modern drug discovery organizations generate large volumes of SAR data. A promising methodology that can be used to mine this chemical data to identify novel structure-activity relationships is the matched molecular pair (MMP) methodology. However, before the full potential of the MMP methodology can be utilized, a MMP identification method that is capable of identifying all MMPs in large chemical data sets on modest computational hardware is required. In this paper we report an algorithm that is capable of systematically generating all MMPs in chemical data sets. Additionally, the algorithm is computationally efficient enough to be applied on large data sets. As an example the algorithm was used to identify the MMPs in the approximately 300k NIH MLSMR set. The algorithm identified approximately 5.3 million matched molecular pairs in the set. These pairs cover approximately 2.6 million unique molecular transformations.

摘要

现代药物发现组织会产生大量的 SAR 数据。一种有前途的方法,可以用来挖掘这些化学数据,以确定新的结构-活性关系,是匹配分子对 (MMP) 方法。然而,在充分利用 MMP 方法的潜力之前,需要一种能够在适度的计算硬件上识别大型化学数据集所有 MMP 的 MMP 识别方法。在本文中,我们报告了一种能够系统地生成化学数据集中所有 MMP 的算法。此外,该算法在计算上非常高效,足以应用于大型数据集。作为一个例子,该算法被用于识别大约 30 万个 NIH MLSMR 数据集的 MMP。该算法在该集中识别了大约 530 万个匹配的分子对。这些对涵盖了大约 260 万个独特的分子转变。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验