Majumder Erica L-W, Billings Elizabeth M, Benton H Paul, Martin Richard L, Palermo Amelia, Guijas Carlos, Rinschen Markus M, Domingo-Almenara Xavier, Montenegro-Burke J Rafael, Tagtow Bradley A, Plumb Robert S, Siuzdak Gary
Center for Mass Spectrometry and Metabolomics, The Scripps Research Institute, La Jolla, CA, USA.
IBM Watson Health, Cambridge, MA, USA.
Nat Protoc. 2021 Mar;16(3):1376-1418. doi: 10.1038/s41596-020-00455-4. Epub 2021 Jan 22.
Cognitive computing is revolutionizing the way big data are processed and integrated, with artificial intelligence (AI) natural language processing (NLP) platforms helping researchers to efficiently search and digest the vast scientific literature. Most available platforms have been developed for biomedical researchers, but new NLP tools are emerging for biologists in other fields and an important example is metabolomics. NLP provides literature-based contextualization of metabolic features that decreases the time and expert-level subject knowledge required during the prioritization, identification and interpretation steps in the metabolomics data analysis pipeline. Here, we describe and demonstrate four workflows that combine metabolomics data with NLP-based literature searches of scientific databases to aid in the analysis of metabolomics data and their biological interpretation. The four procedures can be used in isolation or consecutively, depending on the research questions. The first, used for initial metabolite annotation and prioritization, creates a list of metabolites that would be interesting for follow-up. The second workflow finds literature evidence of the activity of metabolites and metabolic pathways in governing the biological condition on a systems biology level. The third is used to identify candidate biomarkers, and the fourth looks for metabolic conditions or drug-repurposing targets that the two diseases have in common. The protocol can take 1-4 h or more to complete, depending on the processing time of the various software used.
认知计算正在彻底改变大数据的处理和整合方式,人工智能(AI)自然语言处理(NLP)平台帮助研究人员高效地搜索和消化海量的科学文献。大多数现有的平台是为生物医学研究人员开发的,但新的NLP工具正在为其他领域的生物学家涌现,代谢组学就是一个重要的例子。NLP为代谢特征提供基于文献的背景信息,减少了代谢组学数据分析流程中优先级排序、鉴定和解释步骤所需的时间和专业水平的学科知识。在这里,我们描述并展示了四种工作流程,这些流程将代谢组学数据与基于NLP的科学数据库文献搜索相结合,以辅助代谢组学数据的分析及其生物学解释。这四个程序可以单独使用,也可以连续使用,具体取决于研究问题。第一个用于初始代谢物注释和优先级排序,创建一份后续研究感兴趣的代谢物列表。第二个工作流程在系统生物学层面寻找代谢物和代谢途径在控制生物状态方面的活动的文献证据。第三个用于识别候选生物标志物,第四个寻找两种疾病共有的代谢状况或药物再利用靶点。该方案可能需要1-4小时或更长时间才能完成,具体取决于所使用的各种软件的处理时间。