Villalba Héctor, Llambrich Maria, Gumà Josep, Brezmes Jesús, Cumeras Raquel
Department of Oncology, Hospital Universitari Sant Joan de Reus, Institut d'Investigació Sanitària Pere Virgili (IISPV), CERCA, 43204 Reus, Spain.
Department of Electrical Electronic Engineering and Automation, University of Rovira i Virgili (URV), 43007 Tarragona, Spain.
Metabolites. 2023 Nov 21;13(12):1167. doi: 10.3390/metabo13121167.
Metabolomics encounters challenges in cross-study comparisons due to diverse metabolite nomenclature and reporting practices. To bridge this gap, we introduce the Metabolites Merging Strategy (MMS), offering a systematic framework to harmonize multiple metabolite datasets for enhanced interstudy comparability. MMS has three steps. Step 1: Translation and merging of the different datasets by employing InChIKeys for data integration, encompassing the translation of metabolite names (if needed). Followed by Step 2: Attributes' retrieval from the InChIkey, including descriptors of name (title name from PubChem and RefMet name from Metabolomics Workbench), and chemical properties (molecular weight and molecular formula), both systematic (InChI, InChIKey, SMILES) and non-systematic identifiers (PubChem, CheBI, HMDB, KEGG, LipidMaps, DrugBank, Bin ID and CAS number), and their ontology. Finally, a meticulous three-step curation process is used to rectify disparities for conjugated base/acid compounds (optional step), missing attributes, and synonym checking (duplicated information). The MMS procedure is exemplified through a case study of urinary asthma metabolites, where MMS facilitated the identification of significant pathways hidden when no dataset merging strategy was followed. This study highlights the need for standardized and unified metabolite datasets to enhance the reproducibility and comparability of metabolomics studies.
由于代谢物命名和报告方式的多样性,代谢组学在跨研究比较中面临挑战。为了弥合这一差距,我们引入了代谢物合并策略(MMS),提供了一个系统框架,用于协调多个代谢物数据集,以增强研究间的可比性。MMS有三个步骤。第一步:通过使用InChIKey进行数据整合来翻译和合并不同的数据集,包括代谢物名称的翻译(如有需要)。接下来是第二步:从InChIkey中检索属性,包括名称描述符(来自PubChem的标题名称和来自代谢组学工作台的RefMet名称)和化学性质(分子量和分子式),以及系统标识符(InChI、InChIKey、SMILES)和非系统标识符(PubChem、CheBI、HMDB、KEGG、LipidMaps、DrugBank、Bin ID和CAS编号)及其本体。最后,使用一个细致的三步整理过程来纠正共轭碱/酸化合物的差异(可选步骤)、缺失的属性和同义词检查(重复信息)。通过对尿哮喘代谢物的案例研究对MMS程序进行了举例说明,在该案例中,MMS有助于识别在未遵循数据集合并策略时隐藏的重要途径。这项研究强调了标准化和统一代谢物数据集对于提高代谢组学研究的可重复性和可比性的必要性。