Suppr超能文献

利用数据库融合和文本挖掘的综合方法优先进行 IARC 专著的癌症危害评估。

Prioritizing cancer hazard assessments for IARC Monographs using an integrated approach of database fusion and text mining.

机构信息

Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mt Sinai, NY, USA.

Evidence Synthesis and Classification Branch, International Agency for Research on Cancer, Lyon, France.

出版信息

Environ Int. 2021 Nov;156:106624. doi: 10.1016/j.envint.2021.106624. Epub 2021 May 10.

Abstract

BACKGROUND

Systematic evaluation of literature data on the cancer hazards of human exposures is an essential process underlying cancer prevention strategies. The scope and volume of evidence for suspected carcinogens can range from very few to thousands of publications, requiring a complex, systematically planned, and critical procedure to nominate, prioritize and evaluate carcinogenic agents. To aid in this process, database fusion, cheminformatics and text mining techniques can be combined into an integrated approach to inform agent prioritization, selection, and grouping.

RESULTS

We have applied these techniques to agents recommended for the IARC Monographs evaluations during 2020-2024. An integration of PubMed filters to cover cancer epidemiology, key characteristics of carcinogens, chemical lists from 34 databases relevant for cancer research, chemical structure grouping and a literature data-based clustering was applied in an innovative approach to 119 agents recommended by an advisory group for future IARC Monographs evaluations. The approach also facilitated a rational grouping of these agents and aids in understanding the volume and complexity of relevant information, as well as important gaps in coverage of the available studies on cancer etiology and carcinogenesis.

CONCLUSION

A new data-science approach has been applied to diverse agents recommended for cancer hazard assessments, and its applications for the IARC Monographs are demonstrated. The prioritization approach has been made available at www.cancer.idsl.me site for ranking cancer agents.

摘要

背景

系统地评估有关人类暴露致癌危害的文献数据是癌症预防策略的基础。可疑致癌物的证据范围从极少数到数千篇出版物不等,这需要一个复杂的、系统规划的和批判性的程序来提名、优先考虑和评估致癌剂。为了帮助这一过程,可以将数据库融合、化学信息学和文本挖掘技术结合到一个综合方法中,为代理的优先级、选择和分组提供信息。

结果

我们已经将这些技术应用于 2020-2024 年 IARC 专论评估中推荐的制剂。我们应用了一种整合方法,该方法结合了 PubMed 筛选器以涵盖癌症流行病学、致癌剂的关键特征、来自 34 个与癌症研究相关的数据库的化学列表、化学结构分组和基于文献数据的聚类,应用于由咨询小组推荐用于未来 IARC 专论评估的 119 种制剂。该方法还促进了这些制剂的合理分组,并有助于理解相关信息的数量和复杂性,以及癌症病因学和致癌作用现有研究的重要覆盖差距。

结论

已经将一种新的数据科学方法应用于推荐用于癌症危害评估的各种制剂,并且已经在 IARC 专论中展示了其应用。该优先级方法已经在 www.cancer.idsl.me 网站上提供,用于对癌症制剂进行排名。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3ba7/8380673/bf28f9cef82f/nihms-1701145-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验