Suppr
超能文献

归一化逐点互信息（NPMI）在挖掘与疾病相关基因集的生物医学文献中的新应用：乳腺癌发生的案例分析

Novel application of normalized pointwise mutual information (NPMI) to mine biomedical literature for gene sets associated with disease: use case in breast carcinogenesis.

作者信息

Watford Sean M, Grashow Rachel G, De La Rosa Vanessa Y, Rudel Ruthann A, Friedman Katie Paul, Martin Matthew T

机构信息

ORAU, contractor to U.S. Environmental Protection Agency through the National Student Services Contract, Oak Ridge, TN.

Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, UNC-Chapel Hill, Chapel Hill, North Carolina, United States.

出版信息

Comput Toxicol. 2018 Aug;7:46-57. doi: 10.1016/j.comtox.2018.06.003. Epub 2018 Jun 19.

DOI:10.1016/j.comtox.2018.06.003

PMID:32274464

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7144681/

Abstract

Advances in technology within biomedical sciences have led to an inundation of data across many fields, raising new challenges in how best to integrate and analyze these resources. For example, rapid chemical screening programs like the US Environmental Protection Agency's ToxCast and the collaborative effort, Tox21, have produced massive amounts of information on putative chemical mechanisms where assay targets are identified as genes; however, systematically linking these hypothesized mechanisms with toxicity endpoints like disease outcomes remains problematic. Herein we present a novel use of normalized pointwise mutual information (NPMI) to mine biomedical literature for gene associations with biological concepts as represented by Medical Subject Headings (MeSH terms) in PubMed. Resources that tag genes to articles were integrated, then cross-species orthologs were identified using UniRef50 clusters. MeSH term frequency was normalized to reflect the MeSH tree structure, and then the resulting GeneID-MeSH associations were ranked using NPMI. The resulting network, called Entity MeSH Co-occurrence Network (EMCON), is a scalable resource for the identification and ranking of genes for a given topic of interest. The utility of EMCON was evaluated with the use case of breast carcinogenesis. Topics relevant to breast carcinogenesis were used to query EMCON and retrieve genes important to each topic. A breast cancer gene set was compiled through expert literature review (ELR) to assess performance of the search results. We found that the results from EMCON ranked the breast cancer genes from ELR higher than randomly selected genes with a recall of 0.98. Precision of the top five genes for selected topics was calculated as 0.87. This work demonstrates that EMCON can be used to link results to possible biological outcomes, thus aiding in generation of testable hypotheses for furthering understanding of biological function and the contribution of chemical exposures to disease.

摘要

生物医学科学领域的技术进步导致了众多领域数据的泛滥，这对如何最好地整合和分析这些资源提出了新的挑战。例如，像美国环境保护局的ToxCast这样的快速化学筛选项目以及合作项目Tox21，已经产生了大量关于假定化学机制的信息，其中检测靶点被确定为基因；然而，将这些假设机制与疾病结果等毒性终点进行系统关联仍然存在问题。在此，我们提出一种新的方法，即使用归一化逐点互信息（NPMI）从生物医学文献中挖掘与医学主题词表（MeSH词）所代表的生物学概念相关的基因关联。将标记基因与文章的资源进行整合，然后使用UniRef50聚类识别跨物种直系同源基因。对MeSH词频率进行归一化以反映MeSH树状结构，然后使用NPMI对所得的基因ID - MeSH关联进行排序。由此产生的网络，称为实体MeSH共现网络（EMCON），是一种可扩展的资源，用于识别和排序给定感兴趣主题的基因。通过乳腺癌发生的案例评估了EMCON的实用性。使用与乳腺癌发生相关的主题查询EMCON，并检索对每个主题重要的基因。通过专家文献综述（ELR）编制了一个乳腺癌基因集，以评估搜索结果的性能。我们发现，EMCON的结果将ELR中的乳腺癌基因排名高于随机选择的基因，召回率为0.98。选定主题的前五个基因的精确率计算为0.87。这项工作表明，EMCON可用于将结果与可能的生物学结果联系起来，从而有助于生成可测试的假设，以进一步理解生物学功能以及化学暴露对疾病的影响。

相似文献

Novel application of normalized pointwise mutual information (NPMI) to mine biomedical literature for gene sets associated with disease: use case in breast carcinogenesis.

Comput Toxicol. 2018 Aug;7:46-57. doi: 10.1016/j.comtox.2018.06.003. Epub 2018 Jun 19.

BCScreen: A gene panel to test for breast carcinogenesis in chemical safety screening.

Comput Toxicol. 2018 Feb;5:16-24. doi: 10.1016/j.comtox.2017.11.003. Epub 2017 Nov 21.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

REL-NPMI: Exploring genotype and phenotype relationship of pancreatitis based on improved normalized point-by-point mutual information.

Comput Biol Med. 2023 May;158:106868. doi: 10.1016/j.compbiomed.2023.106868. Epub 2023 Apr 4.

BioLitMine: Advanced Mining of Biomedical and Biological Literature About Human Genes and Genes from Major Model Organisms.

G3 (Bethesda). 2020 Dec 3;10(12):4531-4539. doi: 10.1534/g3.120.401775.

Ontology-based Brucella vaccine literature indexing and systematic analysis of gene-vaccine association network.

BMC Immunol. 2011 Aug 26;12:49. doi: 10.1186/1471-2172-12-49.

The Minderoo-Monaco Commission on Plastics and Human Health.

Ann Glob Health. 2023 Mar 21;89(1):23. doi: 10.5334/aogh.4056. eCollection 2023.

Discovering biomedical semantic relations in PubMed queries for information retrieval and database curation.

Database (Oxford). 2016 Mar 25;2016. doi: 10.1093/database/baw025. Print 2016.

Information content in Medline record fields.

Int J Med Inform. 2004 Jun 30;73(6):515-27. doi: 10.1016/j.ijmedinf.2004.02.008.

A genome-wide MeSH-based literature mining system predicts implicit gene-to-gene relationships and networks.

BMC Syst Biol. 2013 Oct 16;7 Suppl 3(Suppl 3):S9. doi: 10.1186/1752-0509-7-S3-S9.

引用本文的文献

Decoding Cellular Stress States for Toxicology Using Single-Cell Transcriptomics.

bioRxiv. 2025 Aug 2:2025.06.10.657506. doi: 10.1101/2025.06.10.657506.

Searching for LINCS to Stress: Using Text Mining to Automate Reference Chemical Curation.

Chem Res Toxicol. 2024 Jun 17;37(6):878-893. doi: 10.1021/acs.chemrestox.3c00335. Epub 2024 May 13.

An expert-driven literature review of "negative" chemicals for developmental neurotoxicity (DNT) in vitro assay evaluation.

Neurotoxicol Teratol. 2022 Sep-Oct;93:107117. doi: 10.1016/j.ntt.2022.107117. Epub 2022 Jul 29.

Environmental mixtures and breast cancer: identifying co-exposure patterns between understudied vs breast cancer-associated chemicals using chemical inventory informatics.

J Expo Sci Environ Epidemiol. 2022 Nov;32(6):794-807. doi: 10.1038/s41370-022-00451-8. Epub 2022 Jun 16.

Expert-Augmented Computational Drug Repurposing Identified Baricitinib as a Treatment for COVID-19.

Front Pharmacol. 2021 Jul 28;12:709856. doi: 10.3389/fphar.2021.709856. eCollection 2021.

A cross-platform approach to characterize and screen potential neurovascular unit toxicants.

Reprod Toxicol. 2020 Sep;96:300-315. doi: 10.1016/j.reprotox.2020.06.010. Epub 2020 Jun 24.

Progress in data interoperability to support computational toxicology and chemical safety evaluation.

Toxicol Appl Pharmacol. 2019 Oct 1;380:114707. doi: 10.1016/j.taap.2019.114707. Epub 2019 Aug 9.

ToxRefDB version 2.0: Improved utility for predictive and retrospective toxicology analyses.

Reprod Toxicol. 2019 Oct;89:145-158. doi: 10.1016/j.reprotox.2019.07.012. Epub 2019 Jul 21.

BCScreen: A gene panel to test for breast carcinogenesis in chemical safety screening.

Comput Toxicol. 2018 Feb;5:16-24. doi: 10.1016/j.comtox.2017.11.003. Epub 2017 Nov 21.

本文引用的文献

BCScreen: A gene panel to test for breast carcinogenesis in chemical safety screening.

Comput Toxicol. 2018 Feb;5:16-24. doi: 10.1016/j.comtox.2017.11.003. Epub 2017 Nov 21.

A hybrid gene selection approach to create the S1500+ targeted gene sets for use in high-throughput transcriptomics.

PLoS One. 2018 Feb 20;13(2):e0191105. doi: 10.1371/journal.pone.0191105. eCollection 2018.

Race-associated biological differences among luminal A and basal-like breast cancers in the Carolina Breast Cancer Study.

Breast Cancer Res. 2017 Dec 11;19(1):131. doi: 10.1186/s13058-017-0914-6.

A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles.

Cell. 2017 Nov 30;171(6):1437-1452.e17. doi: 10.1016/j.cell.2017.10.049.

Chemical Risk Assessment: Traditional vs Public Health Perspectives.

Am J Public Health. 2017 Jul;107(7):1032-1039. doi: 10.2105/AJPH.2017.303771. Epub 2017 May 18.

Mouse Genome Informatics (MGI): Resources for Mining Mouse Genetic, Genomic, and Biological Data in Support of Primary and Translational Research.

Methods Mol Biol. 2017;1488:47-73. doi: 10.1007/978-1-4939-6427-7_3.

UniProt: the universal protein knowledgebase.

Nucleic Acids Res. 2017 Jan 4;45(D1):D158-D169. doi: 10.1093/nar/gkw1099. Epub 2016 Nov 29.

Database Resources of the National Center for Biotechnology Information.

Nucleic Acids Res. 2017 Jan 4;45(D1):D12-D17. doi: 10.1093/nar/gkw1071. Epub 2016 Nov 28.

The Comparative Toxicogenomics Database: update 2017.

Nucleic Acids Res. 2017 Jan 4;45(D1):D972-D978. doi: 10.1093/nar/gkw838. Epub 2016 Sep 19.

Candidate gene prioritization with Endeavour.

Nucleic Acids Res. 2016 Jul 8;44(W1):W117-21. doi: 10.1093/nar/gkw365. Epub 2016 Apr 30.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

归一化逐点互信息（NPMI）在挖掘与疾病相关基因集的生物医学文献中的新应用：乳腺癌发生的案例分析

Novel application of normalized pointwise mutual information (NPMI) to mine biomedical literature for gene sets associated with disease: use case in breast carcinogenesis.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译