Suppr超能文献

环境途径影响基因表达(E.PAGE)作为一个 R 包,用于预测基因-环境关联。

Environmental pathways affecting gene expression (E.PAGE) as an R package to predict gene-environment associations.

机构信息

The University of Queensland Diamantina Institute, Translational Research Institute, The University of Queensland, 37 Kent St, Woolloongabba, QLD, 4102, Australia.

Centre for Microscopy and Microanalysis, University of Queensland, St. Lucia, QLD, 4072, Australia.

出版信息

Sci Rep. 2022 Nov 4;12(1):18710. doi: 10.1038/s41598-022-21988-6.

Abstract

The purpose of this study is to manually and semi-automatically curate a database and develop an R package that will act as a comprehensive resource to understand how biological processes are dysregulated due to interactions with environmental factors. The initial database search run on the Gene Expression Omnibus and the Molecular Signature Database retrieved a total of 90,018 articles. After title and abstract screening against pre-set criteria, a total of 237 datasets were selected and 522 gene modules were manually annotated. We then curated a database containing four environmental factors, cigarette smoking, diet, infections and toxic chemicals, along with a total of 25,789 genes that had an association with one or more of gene modules. The database and statistical analysis package was then tested with the differentially expressed genes obtained from the published literature related to type 1 diabetes, rheumatoid arthritis, small cell lung cancer, COVID-19, cobalt exposure and smoking. On testing, we uncovered statistically enriched biological processes, which revealed pathways associated with environmental factors and the genes. The curated database and enrichment tool are available as R packages at https://github.com/AhmedMehdiLab/E.PATH and https://github.com/AhmedMehdiLab/E.PAGE respectively.

摘要

本研究的目的是手动和半自动地整理一个数据库,并开发一个 R 包,作为一个全面的资源,以了解生物过程如何因与环境因素的相互作用而失调。在基因表达综合数据库和分子特征数据库上进行的初始数据库搜索共检索到 90018 篇文章。在根据预设标准对标题和摘要进行筛选后,共选择了 237 个数据集,并手动注释了 522 个基因模块。然后,我们整理了一个包含四个环境因素(吸烟、饮食、感染和有毒化学物质)以及与一个或多个基因模块相关的总共 25789 个基因的数据库。然后,使用与 1 型糖尿病、类风湿关节炎、小细胞肺癌、COVID-19、钴暴露和吸烟相关的已发表文献中获得的差异表达基因对数据库和统计分析包进行了测试。在测试中,我们发现了统计学上丰富的生物过程,这些过程揭示了与环境因素和基因相关的途径。整理后的数据库和富集工具分别可在 https://github.com/AhmedMehdiLab/E.PATHhttps://github.com/AhmedMehdiLab/E.PAGE 上的 R 包中获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0c3/9636158/cf6ed7ef5924/41598_2022_21988_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验