Suppr超能文献

大规模代谢流行病学中多队列非靶向氢核磁共振代谢组学数据的综合处理工作流程

Workflow for Integrated Processing of Multicohort Untargeted H NMR Metabolomics Data in Large-Scale Metabolic Epidemiology.

作者信息

Karaman Ibrahim, Ferreira Diana L S, Boulangé Claire L, Kaluarachchi Manuja R, Herrington David, Dona Anthony C, Castagné Raphaële, Moayyeri Alireza, Lehne Benjamin, Loh Marie, de Vries Paul S, Dehghan Abbas, Franco Oscar H, Hofman Albert, Evangelou Evangelos, Tzoulaki Ioanna, Elliott Paul, Lindon John C, Ebbels Timothy M D

机构信息

Medical Research Council - Public Health England (MRC-PHE) Centre for Environment and Health, Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London , St. Mary's Campus, Norfolk Place, W2 1PG, London, United Kingdom.

Metabometrix, Ltd. , Bioincubator Unit, Bessemer Building, Prince Consort Road, SW7 2BP South Kensington, London, United Kingdom.

出版信息

J Proteome Res. 2016 Dec 2;15(12):4188-4194. doi: 10.1021/acs.jproteome.6b00125. Epub 2016 Oct 6.

Abstract

Large-scale metabolomics studies involving thousands of samples present multiple challenges in data analysis, particularly when an untargeted platform is used. Studies with multiple cohorts and analysis platforms exacerbate existing problems such as peak alignment and normalization. Therefore, there is a need for robust processing pipelines that can ensure reliable data for statistical analysis. The COMBI-BIO project incorporates serum from ∼8000 individuals, in three cohorts, profiled by six assays in two phases using both H NMR and UPLC-MS. Here we present the COMBI-BIO NMR analysis pipeline and demonstrate its fitness for purpose using representative quality control (QC) samples. NMR spectra were first aligned and normalized. After eliminating interfering signals, outliers identified using Hotelling's T were removed and a cohort/phase adjustment was applied, resulting in two NMR data sets (CPMG and NOESY). Alignment of the NMR data was shown to increase the correlation-based alignment quality measure from 0.319 to 0.391 for CPMG and from 0.536 to 0.586 for NOESY, showing that the improvement was present across both large and small peaks. End-to-end quality assessment of the pipeline was achieved using Hotelling's T distributions. For CPMG spectra, the interquartile range decreased from 1.425 in raw QC data to 0.679 in processed spectra, while the corresponding change for NOESY spectra was from 0.795 to 0.636, indicating an improvement in precision following processing. PCA indicated that gross phase and cohort differences were no longer present. These results illustrate that the pipeline produces robust and reproducible data, successfully addressing the methodological challenges of this large multifaceted study.

摘要

涉及数千个样本的大规模代谢组学研究在数据分析中面临多重挑战,尤其是在使用非靶向平台时。多个队列和分析平台的研究加剧了诸如峰对齐和归一化等现有问题。因此,需要强大的处理流程来确保用于统计分析的可靠数据。COMBI-BIO项目纳入了来自三个队列中约8000名个体的血清,分两个阶段使用1H NMR和UPLC-MS通过六种分析方法进行分析。在此,我们展示了COMBI-BIO NMR分析流程,并使用代表性质量控制(QC)样本证明了其适用性。首先对NMR谱进行对齐和归一化。在消除干扰信号后,去除使用霍特林T检验识别出的异常值,并进行队列/阶段调整,从而得到两个NMR数据集(CPMG和NOESY)。结果表明,NMR数据的对齐使基于相关性的对齐质量指标在CPMG中从0.319提高到0.391,在NOESY中从0.536提高到0.586,表明大小峰均有改善。使用霍特林T分布实现了该流程的端到端质量评估。对于CPMG谱,四分位间距从原始QC数据中的1.425降至处理后谱中的0.679,而NOESY谱的相应变化则从0.795降至0.636,表明处理后精度有所提高。主成分分析表明总体阶段和队列差异不再存在。这些结果说明该流程生成了可靠且可重复的数据,成功应对了这项大型多方面研究的方法学挑战。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验