Suppr超能文献

运用全面的文本挖掘和数据库融合方法生成血液外显子组数据库。

Generating the Blood Exposome Database Using a Comprehensive Text Mining and Database Fusion Approach.

机构信息

National Institutes of Health (NIH) West Coast Metabolomics Center, Genome Center, University of California, Davis, Davis, California, USA.

出版信息

Environ Health Perspect. 2019 Sep;127(9):97008. doi: 10.1289/EHP4713. Epub 2019 Sep 26.

Abstract

BACKGROUND

Blood chemicals are routinely measured in clinical or preclinical research studies to diagnose diseases, assess risks in epidemiological research, or use metabolomic phenotyping in response to treatments. A vast volume of blood-related literature is available via the PubMed database for data mining.

OBJECTIVES

We aimed to generate a comprehensive blood exposome database of endogenous and exogenous chemicals associated with the mammalian circulating system through text mining and database fusion.

METHODS

Using NCBI resources, we retrieved PubMed abstracts, PubChem chemical synonyms, and PMC supplementary tables. We then employed text mining and PubChem crowdsourcing to associate phrases relating to blood with PubChem chemicals. False positives were removed by a phrase pattern and a compound exclusion list.

RESULTS

A query to identify blood-related publications in the PubMed database yielded 1.1 million papers. Matching a total of 15 million synonyms from 6.5 million relevant PubChem chemicals against all blood-related publications yielded 37,514 chemicals and 851,999 publications records. Mapping PubChem compound identifiers to the PubMed database yielded 49,940 unique chemicals linked to 676,643 papers. Analysis of open-access metabolomics papers related to blood phrases in the PMC database yielded 4,039 unique compounds and 204 papers. Consolidating these three approaches summed up to a total of 41,474 achiral structures that were linked to 65,957 PubChem CIDs and to over 878,966 PubMed articles. We mapped these compounds to 50 databases such as those covering metabolites and pathways, governmental and toxicological databases, pharmacology resources, and bioassay repositories. In comparison, HMDB, the Human Metabolome Database, links 1,075 compounds to blood-related primary publications.

CONCLUSION

This new Blood Exposome Database can be used for prioritizing chemicals for systematic reviews, developing target assays in exposome research, identifying compounds in untargeted mass spectrometry, and biological interpretation in metabolomics data. The database is available at http://bloodexposome.org. https://doi.org/10.1289/EHP4713.

摘要

背景

在临床或临床前研究中,通常会测量血液化学物质,以诊断疾病、在流行病学研究中评估风险,或在响应治疗时进行代谢组学表型分析。通过 PubMed 数据库可以获得大量与血液相关的文献,以进行数据挖掘。

目的

我们旨在通过文本挖掘和数据库融合,生成一个与哺乳动物循环系统相关的内源性和外源性化学物质的全面血液暴露组数据库。

方法

使用 NCBI 资源,我们检索了 PubMed 摘要、PubChem 化学同义词和 PMC 补充表。然后,我们采用文本挖掘和 PubChem 众包技术,将与血液相关的短语与 PubChem 化学物质联系起来。通过短语模式和化合物排除列表去除假阳性。

结果

在 PubMed 数据库中查询与血液相关的出版物,得到了 110 万篇论文。将 650 万种相关 PubChem 化学物质的 1500 万个同义词与所有与血液相关的出版物进行匹配,得到了 37514 种化学物质和 851999 条文献记录。将 PubChem 化合物标识符映射到 PubMed 数据库中,得到了 49940 种与 676643 篇论文相关的独特化学物质。对 PMC 数据库中与血液短语相关的开放获取代谢组学论文进行分析,得到了 4039 种独特化合物和 204 篇论文。整合这三种方法,共得到 41474 种无手性结构,与 65957 个 PubChem CID 和超过 878966 篇 PubMed 文章相关联。我们将这些化合物映射到 50 个数据库,如代谢物和途径数据库、政府和毒理学数据库、药理学资源和生物测定库。相比之下,人类代谢组数据库(HMDB)将 1075 种化合物与与血液相关的主要出版物联系起来。

结论

这个新的血液暴露组数据库可用于优先考虑系统综述的化学物质、开发暴露组研究中的靶向检测、识别非靶向质谱中的化合物,以及对代谢组学数据进行生物学解释。该数据库可在 http://bloodexposome.org 上获取。https://doi.org/10.1289/EHP4713.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a9f2/6794490/97cbf20a5299/ehp-127-097008-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验