相似性下选：寻找用于无参考代谢组学的最不相似分子构象

Similarity Downselection: Finding the Most Dissimilar Molecular Conformers for Reference-Free Metabolomics.

作者信息

Nielson Felicity F, Kay Bill, Young Stephen J, Colby Sean M, Renslow Ryan S, Metz Thomas O

机构信息

Pacific Northwest National Laboratory, Biological Sciences Division, Richland, WA 99354, USA.

Pacific Northwest National Laboratory, Advanced Computing, Mathematics, and Data Division, Richland, WA 99354, USA.

出版信息

Metabolites. 2023 Jan 9;13(1):105. doi: 10.3390/metabo13010105.

DOI:10.3390/metabo13010105

PMID:36677030

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9864474/

Abstract

Computational methods for creating in silico libraries of molecular descriptors (e.g., collision cross sections) are becoming increasingly prevalent due to the limited number of authentic reference materials available for traditional library building. These so-called "reference-free metabolomics" methods require sampling sets of molecular conformers in order to produce high accuracy property predictions. Due to the computational cost of the subsequent calculations for each conformer, there is a need to sample the most relevant subset and avoid repeating calculations on conformers that are nearly identical. The goal of this study is to introduce a heuristic method of finding the most dissimilar conformers from a larger population in order to help speed up reference-free calculation methods and maintain a high property prediction accuracy. Finding the set of the items most dissimilar from each other out of a larger population becomes increasingly difficult and computationally expensive as either or the population size grows large. Because there exists a pairwise relationship between each item and all other items in the population, finding the of the most dissimilar items is different than simply sorting an array of numbers. For instance, if you have a set of the most dissimilar = 4 items, one or more of the items from = 4 might not be in the set = 5. An exact solution would have to search all possible combinations of size in the population exhaustively. We present an open-source software called similarity downselection (SDS), written in Python and freely available on GitHub. SDS implements a heuristic algorithm for quickly finding the approximate set(s) of the most dissimilar items. We benchmark SDS against a Monte Carlo method, which attempts to find the exact solution through repeated random sampling. We show that for SDS to find the set of most dissimilar conformers, our method is not only orders of magnitude faster, but it is also more accurate than running Monte Carlo for 1,000,000 iterations, each searching for set sizes = 3-7 out of a population of 50,000. We also benchmark SDS against the exact solution for example small populations, showing that SDS produces a solution close to the exact solution in these instances. Using theoretical approaches, we also demonstrate the constraints of the greedy algorithm and its efficacy as a ratio to the exact solution.

摘要

由于传统库构建中可用的真实参考材料数量有限，用于创建分子描述符（例如碰撞截面）的计算机模拟库的计算方法正变得越来越普遍。这些所谓的“无参考代谢组学”方法需要对分子构象体进行采样集，以便产生高精度的性质预测。由于对每个构象体进行后续计算的计算成本，需要对最相关的子集进行采样，并避免对几乎相同的构象体重复计算。本研究的目标是引入一种启发式方法，从更大的群体中找到最不相似的构象体，以帮助加速无参考计算方法并保持较高的性质预测准确性。随着(n)或群体规模的增大，从更大的群体中找到彼此最不相似的(n)个项目集变得越来越困难且计算成本高昂。因为群体中每个项目与所有其他项目之间存在成对关系，找到(n)个最不相似项目的集合不同于简单地对数字数组进行排序。例如，如果有一组最不相似的(n = 4)个项目，那么(n = 5)中的一个或多个项目可能不在(n = 4)的集合中。精确的解决方案必须详尽地搜索群体中大小为(n)的所有可能组合。我们提出了一个名为相似性下选（SDS）的开源软件，用Python编写，可在GitHub上免费获取。SDS实现了一种启发式算法，用于快速找到(n)个最不相似项目的近似集合。我们将SDS与蒙特卡罗方法进行基准测试，蒙特卡罗方法试图通过重复随机采样找到精确解。我们表明，对于SDS找到(n)个最不相似的构象体集合，我们的方法不仅快几个数量级，而且比运行100万次迭代的蒙特卡罗方法更准确，每次蒙特卡罗方法在50000个群体中搜索大小为(n = 3-7)的集合。我们还将SDS与小群体示例的精确解进行基准测试，表明在这些情况下SDS产生的解接近精确解。使用理论方法，我们还证明了贪心算法的约束及其作为与精确解的比率的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9586/9864474/09f8d62e0023/metabolites-13-00105-g001.jpg

相似文献

Metabolites. 2023 Jan 9;13(1):105. doi: 10.3390/metabo13010105.

Correction: Nielson et al. Similarity Downselection: Finding the Most Dissimilar Molecular Conformers for Reference-Free Metabolomics. 2023, , 105.

Metabolites. 2023 Nov 17;13(11):1158. doi: 10.3390/metabo13111158.

Exploring the Impacts of Conformer Selection Methods on Ion Mobility Collision Cross Section Predictions.

Anal Chem. 2021 Mar 2;93(8):3830-3838. doi: 10.1021/acs.analchem.0c04341. Epub 2021 Feb 19.

PubChem3D: Similar conformers.

J Cheminform. 2011 May 9;3:13. doi: 10.1186/1758-2946-3-13.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

ISiCLE: A Quantum Chemistry Pipeline for Establishing in Silico Collision Cross Section Libraries.

Anal Chem. 2019 Apr 2;91(7):4346-4356. doi: 10.1021/acs.analchem.8b04567. Epub 2019 Mar 6.

Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification

ConfSolv: Prediction of Solute Conformer-Free Energies across a Range of Solvents.

J Phys Chem B. 2023 Nov 30;127(47):10151-10170. doi: 10.1021/acs.jpcb.3c05904. Epub 2023 Nov 15.

Small class sizes for improving student achievement in primary and secondary schools: a systematic review.

Campbell Syst Rev. 2018 Oct 11;14(1):1-107. doi: 10.4073/csr.2018.10. eCollection 2018.

Erratum: Eyestalk Ablation to Increase Ovarian Maturation in Mud Crabs.

J Vis Exp. 2023 May 26(195). doi: 10.3791/6561.

引用本文的文献

GCMS-ID: a webserver for identifying compounds from gas chromatography mass spectrometry experiments.

Nucleic Acids Res. 2024 Jul 5;52(W1):W381-W389. doi: 10.1093/nar/gkae425.

Correction: Nielson et al. Similarity Downselection: Finding the Most Dissimilar Molecular Conformers for Reference-Free Metabolomics. 2023, , 105.

Metabolites. 2023 Nov 17;13(11):1158. doi: 10.3390/metabo13111158.

本文引用的文献

AutoGraph: Autonomous Graph-Based Clustering of Small-Molecule Conformations.

J Chem Inf Model. 2021 Apr 26;61(4):1647-1656. doi: 10.1021/acs.jcim.0c01492. Epub 2021 Mar 29.

Exploring the Impacts of Conformer Selection Methods on Ion Mobility Collision Cross Section Predictions.

Anal Chem. 2021 Mar 2;93(8):3830-3838. doi: 10.1021/acs.analchem.0c04341. Epub 2021 Feb 19.

Automated exploration of the low-energy chemical space with fast quantum chemical methods.

Phys Chem Chem Phys. 2020 Apr 14;22(14):7169-7192. doi: 10.1039/c9cp06869d. Epub 2020 Feb 19.

The optimal DFT approach in DP4 NMR structure analysis - pushing the limits of relative configuration elucidation.

Org Biomol Chem. 2019 Jun 28;17(24):5886-5890. doi: 10.1039/c9ob00840c. Epub 2019 May 31.

ISiCLE: A Quantum Chemistry Pipeline for Establishing in Silico Collision Cross Section Libraries.

Anal Chem. 2019 Apr 2;91(7):4346-4356. doi: 10.1021/acs.analchem.8b04567. Epub 2019 Mar 6.

The comparison of automated clustering algorithms for resampling representative conformer ensembles with RMSD matrix.

J Cheminform. 2017 Mar 23;9(1):21. doi: 10.1186/s13321-017-0208-0.

IEEE Trans Pattern Anal Mach Intell. 2016 Nov;38(11):2182-2197. doi: 10.1109/TPAMI.2015.2511748. Epub 2015 Dec 23.

Freely available conformer generation methods: how good are they?

J Chem Inf Model. 2012 May 25;52(5):1146-58. doi: 10.1021/ci2004658. Epub 2012 Apr 19.

Open Babel: An open chemical toolbox.

J Cheminform. 2011 Oct 7;3:33. doi: 10.1186/1758-2946-3-33.

Dynamic clustering threshold reduces conformer ensemble size while maintaining a biologically relevant ensemble.

J Comput Aided Mol Des. 2010 Aug;24(8):675-86. doi: 10.1007/s10822-010-9365-1. Epub 2010 May 25.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

相似性下选：寻找用于无参考代谢组学的最不相似分子构象

Similarity Downselection: Finding the Most Dissimilar Molecular Conformers for Reference-Free Metabolomics.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献