运用全面的文本挖掘和数据库融合方法生成血液外显子组数据库。

Generating the Blood Exposome Database Using a Comprehensive Text Mining and Database Fusion Approach.

机构信息

National Institutes of Health (NIH) West Coast Metabolomics Center, Genome Center, University of California, Davis, Davis, California, USA.

出版信息

Environ Health Perspect. 2019 Sep;127(9):97008. doi: 10.1289/EHP4713. Epub 2019 Sep 26.

DOI:10.1289/EHP4713

PMID:31557052

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6794490/

Abstract

BACKGROUND

Blood chemicals are routinely measured in clinical or preclinical research studies to diagnose diseases, assess risks in epidemiological research, or use metabolomic phenotyping in response to treatments. A vast volume of blood-related literature is available via the PubMed database for data mining.

OBJECTIVES

We aimed to generate a comprehensive blood exposome database of endogenous and exogenous chemicals associated with the mammalian circulating system through text mining and database fusion.

METHODS

Using NCBI resources, we retrieved PubMed abstracts, PubChem chemical synonyms, and PMC supplementary tables. We then employed text mining and PubChem crowdsourcing to associate phrases relating to blood with PubChem chemicals. False positives were removed by a phrase pattern and a compound exclusion list.

RESULTS

A query to identify blood-related publications in the PubMed database yielded 1.1 million papers. Matching a total of 15 million synonyms from 6.5 million relevant PubChem chemicals against all blood-related publications yielded 37,514 chemicals and 851,999 publications records. Mapping PubChem compound identifiers to the PubMed database yielded 49,940 unique chemicals linked to 676,643 papers. Analysis of open-access metabolomics papers related to blood phrases in the PMC database yielded 4,039 unique compounds and 204 papers. Consolidating these three approaches summed up to a total of 41,474 achiral structures that were linked to 65,957 PubChem CIDs and to over 878,966 PubMed articles. We mapped these compounds to 50 databases such as those covering metabolites and pathways, governmental and toxicological databases, pharmacology resources, and bioassay repositories. In comparison, HMDB, the Human Metabolome Database, links 1,075 compounds to blood-related primary publications.

CONCLUSION

This new Blood Exposome Database can be used for prioritizing chemicals for systematic reviews, developing target assays in exposome research, identifying compounds in untargeted mass spectrometry, and biological interpretation in metabolomics data. The database is available at http://bloodexposome.org. https://doi.org/10.1289/EHP4713.

摘要

背景

在临床或临床前研究中，通常会测量血液化学物质，以诊断疾病、在流行病学研究中评估风险，或在响应治疗时进行代谢组学表型分析。通过 PubMed 数据库可以获得大量与血液相关的文献，以进行数据挖掘。

目的

我们旨在通过文本挖掘和数据库融合，生成一个与哺乳动物循环系统相关的内源性和外源性化学物质的全面血液暴露组数据库。

方法

使用 NCBI 资源，我们检索了 PubMed 摘要、PubChem 化学同义词和 PMC 补充表。然后，我们采用文本挖掘和 PubChem 众包技术，将与血液相关的短语与 PubChem 化学物质联系起来。通过短语模式和化合物排除列表去除假阳性。

结果

在 PubMed 数据库中查询与血液相关的出版物，得到了 110 万篇论文。将 650 万种相关 PubChem 化学物质的 1500 万个同义词与所有与血液相关的出版物进行匹配，得到了 37514 种化学物质和 851999 条文献记录。将 PubChem 化合物标识符映射到 PubMed 数据库中，得到了 49940 种与 676643 篇论文相关的独特化学物质。对 PMC 数据库中与血液短语相关的开放获取代谢组学论文进行分析，得到了 4039 种独特化合物和 204 篇论文。整合这三种方法，共得到 41474 种无手性结构，与 65957 个 PubChem CID 和超过 878966 篇 PubMed 文章相关联。我们将这些化合物映射到 50 个数据库，如代谢物和途径数据库、政府和毒理学数据库、药理学资源和生物测定库。相比之下，人类代谢组数据库（HMDB）将 1075 种化合物与与血液相关的主要出版物联系起来。

结论

这个新的血液暴露组数据库可用于优先考虑系统综述的化学物质、开发暴露组研究中的靶向检测、识别非靶向质谱中的化合物，以及对代谢组学数据进行生物学解释。该数据库可在 http://bloodexposome.org 上获取。https://doi.org/10.1289/EHP4713.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a9f2/6794490/97cbf20a5299/ehp-127-097008-g0001.jpg

相似文献

Generating the Blood Exposome Database Using a Comprehensive Text Mining and Database Fusion Approach.运用全面的文本挖掘和数据库融合方法生成血液外显子组数据库。

Environ Health Perspect. 2019 Sep;127(9):97008. doi: 10.1289/EHP4713. Epub 2019 Sep 26.

Risk-Based Chemical Ranking and Generating a Prioritized Human Exposome Database.基于风险的化学物质排序和生成优先人类暴露组数据库。

Environ Health Perspect. 2021 Apr;129(4):47014. doi: 10.1289/EHP7722. Epub 2021 Apr 30.

An annotation database for chemicals of emerging concern in exposome research.暴露组学研究中新兴关注化学物质的注释数据库。

Environ Int. 2021 Jul;152:106511. doi: 10.1016/j.envint.2021.106511. Epub 2021 Mar 24.

Characterization of the Human Exposome by a Comprehensive and Quantitative Large-Scale Multianalyte Metabolomics Platform.人类暴露组的特征分析：基于全面、定量的大规模多代谢组学分析平台。

Anal Chem. 2020 Oct 20;92(20):13767-13775. doi: 10.1021/acs.analchem.0c02008. Epub 2020 Sep 30.

Constructing HairDB to facilitate exposome research using human hair.构建 HairDB 以促进利用人类头发进行外显子组研究。

Environ Int. 2024 Nov;193:109077. doi: 10.1016/j.envint.2024.109077. Epub 2024 Oct 16.

Monitoring long-term chemical exposome by characterizing the hair metabolome using a high-resolution mass spectrometry-based suspect screening approach.采用基于高分辨质谱的可疑物筛查方法，通过对头发代谢组学进行特征分析，监测长期化学暴露组。

Chemosphere. 2023 Aug;332:138864. doi: 10.1016/j.chemosphere.2023.138864. Epub 2023 May 6.

Human Indoor Exposome of Chemicals in Dust and Risk Prioritization Using EPA's ToxCast Database.人类室内尘埃化学物质暴露组学及利用 EPA 的 ToxCast 数据库进行风险优先级排序

Environ Sci Technol. 2019 Jun 18;53(12):7045-7054. doi: 10.1021/acs.est.9b00280. Epub 2019 May 28.

BioReader: a text mining tool for performing classification of biomedical literature.BioReader：一种文本挖掘工具，用于对生物医学文献进行分类。

BMC Bioinformatics. 2019 Feb 4;19(Suppl 13):57. doi: 10.1186/s12859-019-2607-x.

Large-scale event extraction from literature with multi-level gene normalization.从文献中进行多层次基因标准化的大规模事件提取。

PLoS One. 2013 Apr 17;8(4):e55814. doi: 10.1371/journal.pone.0055814. Print 2013.

Tooth biomarkers to characterize the temporal dynamics of the fetal and early-life exposome.用于描述胎儿和生命早期外显子组时间动态的牙齿生物标志物。

Environ Int. 2021 Dec;157:106849. doi: 10.1016/j.envint.2021.106849. Epub 2021 Sep 2.

引用本文的文献

Pre-diagnostic serum metabolome and breast cancer risk: a nested case-control study.诊断前血清代谢组与乳腺癌风险：一项巢式病例对照研究

Breast Cancer Res. 2025 Aug 27;27(1):156. doi: 10.1186/s13058-025-02102-w.

Control of Golgi- V-ATPase through Sac1-dependent co-regulation of PI(4)P and cholesterol.通过Sac1依赖的PI(4)P和胆固醇共同调节来控制高尔基体V-ATP酶。

Nat Commun. 2025 Aug 21;16(1):7808. doi: 10.1038/s41467-025-63125-7.

Identification of Differential Metabolites in Chronic Suppurative Otitis Media With Non-Targeted and Targeted Metabolomics Approach.基于非靶向和靶向代谢组学方法鉴定慢性化脓性中耳炎中的差异代谢物

Smart Med. 2025 Jul 30;4(3):e70015. doi: 10.1002/smmd.70015. eCollection 2025 Sep.

Evaluation of how laser photostimulation at two wavelengths alters the antimicrobial potential of Streptomycetes spp.评估两种波长的激光光刺激如何改变链霉菌属的抗菌潜力。

Sci Rep. 2025 Aug 7;15(1):28882. doi: 10.1038/s41598-025-14788-1.

Saliva as a non-invasive matrix for assessing xenobiotic metabolites and metabolomes: implications for maternal health and preeclampsia.唾液作为评估外源性代谢物和代谢组的非侵入性基质：对孕产妇健康和先兆子痫的意义。

Int J Oral Sci. 2025 Jul 22;17(1):55. doi: 10.1038/s41368-025-00390-8.

Constructing a consensus serum metabolome.构建共识血清代谢组。

bioRxiv. 2025 May 11:2025.05.07.652782. doi: 10.1101/2025.05.07.652782.

MassCube improves accuracy for metabolomics data processing from raw files to phenotype classifiers.MassCube提高了从原始文件到表型分类器的代谢组学数据处理的准确性。

Nat Commun. 2025 Jul 1;16(1):5487. doi: 10.1038/s41467-025-60640-5.

The Effects of Pretreated and Fermented Corn Stalks on Growth Performance, Nutrient Digestion, Intestinal Structure and Function, and Immune Function in New Zealand Rabbits.预处理和发酵玉米秸秆对新西兰兔生长性能、养分消化、肠道结构与功能及免疫功能的影响

Animals (Basel). 2025 Jun 12;15(12):1737. doi: 10.3390/ani15121737.

Elevated Dipeptides and Agrochemicals in the Saliva of Type 2 Diabetes Mellitus Patients: A Dual Origin Metabolomic Insights.2型糖尿病患者唾液中升高的二肽和农用化学品：双源代谢组学见解

Int Dent J. 2025 May 30;75(4):100836. doi: 10.1016/j.identj.2025.100836.

Multi-omics reveals the associations among the fecal metabolome, intestinal bacteria, and serum indicators in patients with hepatocellular carcinoma.多组学揭示了肝细胞癌患者粪便代谢组、肠道细菌和血清指标之间的关联。

World J Gastroenterol. 2025 Apr 21;31(15):104996. doi: 10.3748/wjg.v31.i15.104996.

本文引用的文献

A Comprehensive Plasma Metabolomics Dataset for a Cohort of Mouse Knockouts within the International Mouse Phenotyping Consortium.国际小鼠表型分析联盟中一组小鼠基因敲除的综合血浆代谢组学数据集。

Metabolites. 2019 May 22;9(5):101. doi: 10.3390/metabo9050101.

Serum Metabolites and Cardiac Death in Patients on Hemodialysis.血液透析患者的血清代谢物与心源性死亡

Clin J Am Soc Nephrol. 2019 May 7;14(5):747-749. doi: 10.2215/CJN.12691018. Epub 2019 Apr 8.

Untargeted metabolomics identifies trimethyllysine, a TMAO-producing nutrient precursor, as a predictor of incident cardiovascular disease risk.非靶向代谢组学鉴定出三甲胺 N-氧化物（TMAO）产生的营养前体——三甲基赖氨酸，作为预测心血管疾病发病风险的标志物。

JCI Insight. 2018 Mar 22;3(6):99096. doi: 10.1172/jci.insight.99096.

Acetaminophen (Paracetamol) Use Modifies the Sulfation of Sex Hormones.对乙酰氨基酚（扑热息痛）的使用会改变性激素的硫酸化。

EBioMedicine. 2018 Feb;28:316-323. doi: 10.1016/j.ebiom.2018.01.033. Epub 2018 Feb 15.

Plasma Amino Acids During 8 Weeks of Overfeeding: Relation to Diet Body Composition and Fat Cell Size in the PROOF Study.8 周过度喂养期间的血浆氨基酸：与 PROOF 研究中的饮食身体成分和脂肪细胞大小的关系。

Obesity (Silver Spring). 2018 Feb;26(2):324-331. doi: 10.1002/oby.22087. Epub 2017 Dec 27.

Bile acid quantification of 20 plasma metabolites identifies lithocholic acid as a putative biomarker in Alzheimer's disease.对20种血浆代谢物进行胆汁酸定量分析，结果表明石胆酸是阿尔茨海默病的一种潜在生物标志物。

Metabolomics. 2018;14(1):1. doi: 10.1007/s11306-017-1297-5. Epub 2017 Nov 17.

NutriChem 2.0: exploring the effect of plant-based foods on human health and drug efficacy.NutriChem 2.0：探索植物性食物对人体健康和药物疗效的影响。

Database (Oxford). 2017 Jan 1;2017. doi: 10.1093/database/bax044.

Identifying metabolites by integrating metabolome databases with mass spectrometry cheminformatics.通过将代谢组数据库与质谱化学信息学相结合来鉴定代谢物。

Nat Methods. 2018 Jan;15(1):53-56. doi: 10.1038/nmeth.4512. Epub 2017 Nov 27.

HMDB 4.0: the human metabolome database for 2018.HMDB 4.0：2018 年人类代谢组数据库。

Nucleic Acids Res. 2018 Jan 4;46(D1):D608-D617. doi: 10.1093/nar/gkx1089.

Comprehensive comparison of in silico MS/MS fragmentation tools of the CASMI contest: database boosting is needed to achieve 93% accuracy.CASMI竞赛中计算机辅助MS/MS碎裂工具的综合比较：需要数据库增强才能达到93%的准确率。

J Cheminform. 2017 May 25;9(1):32. doi: 10.1186/s13321-017-0219-x.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

运用全面的文本挖掘和数据库融合方法生成血液外显子组数据库。

Generating the Blood Exposome Database Using a Comprehensive Text Mining and Database Fusion Approach.

机构信息

出版信息

BACKGROUND

OBJECTIVES

METHODS

RESULTS

CONCLUSION

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献