Hart Chloe Engler, Gadiya Yojana, Kind Tobias, Krettler Christoph A, Gaetz Matthew, Misra Biswapriya B, Healey David, Allen August, Colluru Viswa, Domingo-Fernández Daniel
Enveda, Boulder, CO 80301, United States.
Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf033.
The plant kingdom, encompassing nearly 400,000 known species, produces an immense diversity of metabolites, including primary compounds essential for survival and secondary metabolites specialized for ecological interactions. These metabolites constitute a vast and complex phytochemical space with significant potential applications in medicine, agriculture, and biotechnology. However, much of this chemical diversity remains unexplored, as only a fraction of plant species has been studied comprehensively. In this work, we estimate the size of the plant chemical space by leveraging large-scale metabolomics and literature datasets. We begin by examining the known chemical space, which, while containing at most several hundred thousand unique compounds, remains sparsely covered. Using data from over 1,000 plant species, we apply various mass spectrometry-based approaches-a formula prediction model, a de novo prediction model, a combination of library search and de novo prediction, and MS2 clustering-to estimate the number of unique structures. Our methods suggest that the number of unique compounds in the metabolomics dataset alone may already surpass existing estimates of plant chemical diversity. Finally, we project these findings across the entire plant kingdom, estimating that the total plant chemical space likely spans millions, if not more, with most still unexplored.
植物王国包含近40万种已知物种,产生了种类繁多的代谢产物,包括生存所必需的初级化合物和专门用于生态相互作用的次级代谢产物。这些代谢产物构成了一个广阔而复杂的植物化学空间,在医学、农业和生物技术领域具有巨大的潜在应用价值。然而,这种化学多样性的大部分仍未被探索,因为只有一小部分植物物种得到了全面研究。在这项工作中,我们通过利用大规模代谢组学和文献数据集来估计植物化学空间的大小。我们首先研究已知的化学空间,尽管其中最多包含几十万种独特的化合物,但仍然覆盖稀疏。利用来自1000多种植物物种的数据,我们应用了各种基于质谱的方法——分子式预测模型、从头预测模型、库检索和从头预测相结合的方法以及MS2聚类——来估计独特结构的数量。我们的方法表明,仅代谢组学数据集中独特化合物的数量可能已经超过了现有的植物化学多样性估计值。最后,我们将这些发现推广到整个植物王国,估计整个植物化学空间可能涵盖数百万种化合物,甚至更多,其中大部分仍未被探索。