• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

CarD-T:通过Transformer解释癌基因词汇

CarD-T: Interpreting Carcinomic Lexicon via Transformers.

作者信息

O'Neill Jamey, Reddy Gudur Ashrith, Dhillon Nermeeta, Tripathi Osika, Alexandrov Ludmil, Katira Parag

机构信息

Mechanical Engineering Department, San Diego State University, San Diego, CA, USA.

Department of Bioengineering, University of California San Diego, La Jolla, CA, USA.

出版信息

medRxiv. 2024 Aug 31:2024.08.13.24311948. doi: 10.1101/2024.08.13.24311948.

DOI:10.1101/2024.08.13.24311948
PMID:39185518
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11343268/
Abstract

The identification and classification of carcinogens is critical in cancer epidemiology, necessitating updated methodologies to manage the burgeoning biomedical literature. Current systems, like those run by the International Agency for Research on Cancer (IARC) and the National Toxicology Program (NTP), face challenges due to manual vetting and disparities in carcinogen classification spurred by the volume of emerging data. To address these issues, we introduced the Carcinogen Detection via Transformers (CarD-T) framework, a text analytics approach that combines transformer-based machine learning with probabilistic statistical analysis to efficiently nominate carcinogens from scientific texts. CarD-T uses Named Entity Recognition (NER) trained on PubMed abstracts featuring known carcinogens from IARC groups and includes a context classifier to enhance accuracy and manage computational demands. Using this method, journal publication data indexed with carcinogenicity & carcinogenesis Medical Subject Headings (MeSH) terms from the last 25 years was analyzed, identifying potential carcinogens. Training CarD-T on 60% of established carcinogens (Group 1 and 2A carcinogens, IARC designation), CarD-T correctly to identifies all of the remaining Group 1 and 2A designated carcinogens from the analyzed text. In addition, CarD-T nominates roughly 1500 more entities as potential carcinogens that have at least two publications citing evidence of carcinogenicity. Comparative assessment of CarD-T against GPT-4 model reveals a high recall (0.857 vs 0.705) and F1 score (0.875 vs 0.792), and comparable precision (0.894 vs 0.903). Additionally, CarD-T highlights 554 entities that show disputing evidence for carcinogenicity. These are further analyzed using Bayesian temporal Probabilistic Carcinogenic Denomination (PCarD) to provide probabilistic evaluations of their carcinogenic status based on evolving evidence. Our findings underscore that the CarD-T framework is not only robust and effective in identifying and nominating potential carcinogens within vast biomedical literature but also efficient on consumer GPUs. This integration of advanced NLP capabilities with vital epidemiological analysis significantly enhances the agility of public health responses to carcinogen identification, thereby setting a new benchmark for automated, scalable toxicological investigations.

摘要

致癌物的识别和分类在癌症流行病学中至关重要,因此需要更新方法来管理不断涌现的生物医学文献。目前的系统,如由国际癌症研究机构(IARC)和国家毒理学计划(NTP)运行的系统,由于人工审核以及新出现的数据量引发的致癌物分类差异而面临挑战。为了解决这些问题,我们引入了通过Transformer进行致癌物检测(CarD-T)框架,这是一种文本分析方法,它将基于Transformer的机器学习与概率统计分析相结合,以从科学文本中高效地提名致癌物。CarD-T使用在包含IARC组已知致癌物的PubMed摘要上训练的命名实体识别(NER),并包括一个上下文分类器以提高准确性并管理计算需求。使用这种方法,分析了过去25年中索引有致癌性和致癌作用医学主题词(MeSH)的期刊发表数据,识别潜在致癌物。在60%的已确定致癌物(1类和2A类致癌物,IARC指定)上训练CarD-T,CarD-T从分析文本中正确识别出所有其余的1类和2A类指定致癌物。此外,CarD-T提名了大约1500个更多实体作为潜在致癌物,这些实体至少有两篇引用致癌证据的出版物。将CarD-T与GPT-4模型进行比较评估,结果显示召回率较高(0.857对0.705)和F1分数较高(0.875对0.792),并且精度相当(0.894对0.903)。此外,CarD-T突出显示了554个显示致癌性争议证据的实体。使用贝叶斯时间概率致癌命名(PCarD)对这些进行进一步分析,以根据不断演变的证据对其致癌状态进行概率评估。我们的研究结果强调,CarD-T框架不仅在识别和提名大量生物医学文献中的潜在致癌物方面强大且有效,而且在消费级GPU上也很高效。这种先进的自然语言处理能力与重要的流行病学分析的整合显著提高了公共卫生对致癌物识别反应的敏捷性,从而为自动化、可扩展的毒理学研究设定了新的基准。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d66/11370823/c0cbcd5b1c17/nihpp-2024.08.13.24311948v2-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d66/11370823/d962e701c511/nihpp-2024.08.13.24311948v2-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d66/11370823/9657998d8d84/nihpp-2024.08.13.24311948v2-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d66/11370823/c0cbcd5b1c17/nihpp-2024.08.13.24311948v2-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d66/11370823/d962e701c511/nihpp-2024.08.13.24311948v2-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d66/11370823/9657998d8d84/nihpp-2024.08.13.24311948v2-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d66/11370823/c0cbcd5b1c17/nihpp-2024.08.13.24311948v2-f0003.jpg

相似文献

1
CarD-T: Interpreting Carcinomic Lexicon via Transformers.CarD-T:通过Transformer解释癌基因词汇
medRxiv. 2024 Aug 31:2024.08.13.24311948. doi: 10.1101/2024.08.13.24311948.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
Sexual Harassment and Prevention Training性骚扰与预防培训
4
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.慢性斑块状银屑病的全身药理学治疗:一项网状Meta分析。
Cochrane Database Syst Rev. 2020 Jan 9;1(1):CD011535. doi: 10.1002/14651858.CD011535.pub3.
5
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病:网络荟萃分析。
Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.
6
Home treatment for mental health problems: a systematic review.心理健康问题的居家治疗:一项系统综述
Health Technol Assess. 2001;5(15):1-139. doi: 10.3310/hta5150.
7
Survivor, family and professional experiences of psychosocial interventions for sexual abuse and violence: a qualitative evidence synthesis.性虐待和暴力的心理社会干预的幸存者、家庭和专业人员的经验:定性证据综合。
Cochrane Database Syst Rev. 2022 Oct 4;10(10):CD013648. doi: 10.1002/14651858.CD013648.pub2.
8
Systemic Inflammatory Response Syndrome全身炎症反应综合征
9
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
10
Interventions for promoting habitual exercise in people living with and beyond cancer.促进癌症患者及康复者进行习惯性锻炼的干预措施。
Cochrane Database Syst Rev. 2018 Sep 19;9(9):CD010192. doi: 10.1002/14651858.CD010192.pub3.

本文引用的文献

1
Empirical assessment of ChatGPT's answering capabilities in natural science and engineering.ChatGPT在自然科学与工程领域回答能力的实证评估。
Sci Rep. 2024 Feb 29;14(1):4998. doi: 10.1038/s41598-024-54936-7.
2
Structured information extraction from scientific text with large language models.利用大语言模型从科学文本中提取结构化信息。
Nat Commun. 2024 Feb 15;15(1):1418. doi: 10.1038/s41467-024-45563-x.
3
Improving large language models for clinical named entity recognition via prompt engineering.通过提示工程改进临床命名实体识别的大型语言模型。
J Am Med Inform Assoc. 2024 Sep 1;31(9):1812-1820. doi: 10.1093/jamia/ocad259.
4
A Bibliometric Analysis of the Trends and Evolution on Inhalation Injury Research.吸入性损伤研究趋势与演进的文献计量分析。
J Burn Care Res. 2024 Mar 4;45(2):438-450. doi: 10.1093/jbcr/irad172.
5
Polychlorinated biphenyls and organochlorine pesticides in surface sediments from river networks, South Korea: Spatial distribution, source identification, and ecological risks.韩国河网系统表层沉积物中的多氯联苯和有机氯农药:空间分布、来源识别和生态风险。
Environ Sci Pollut Res Int. 2023 Sep;30(41):94371-94385. doi: 10.1007/s11356-023-28973-0. Epub 2023 Aug 2.
6
Natural Language Generation and Understanding of Big Code for AI-Assisted Programming: A Review.用于人工智能辅助编程的大代码自然语言生成与理解:综述
Entropy (Basel). 2023 Jun 1;25(6):888. doi: 10.3390/e25060888.
7
Exponential growth of systematic reviews assessing artificial intelligence studies in medicine: challenges and opportunities.医学领域中评估人工智能研究的系统评价呈指数级增长:挑战与机遇并存。
Syst Rev. 2022 Jun 28;11(1):132. doi: 10.1186/s13643-022-01984-7.
8
AI bias: exploring discriminatory algorithmic decision-making models and the application of possible machine-centric solutions adapted from the pharmaceutical industry.人工智能偏差:探索具有歧视性的算法决策模型以及源自制药行业的以机器为中心的可能解决方案的应用。
AI Ethics. 2022;2(4):771-787. doi: 10.1007/s43681-022-00138-8. Epub 2022 Feb 10.
9
Prioritizing cancer hazard assessments for IARC Monographs using an integrated approach of database fusion and text mining.利用数据库融合和文本挖掘的综合方法优先进行 IARC 专著的癌症危害评估。
Environ Int. 2021 Nov;156:106624. doi: 10.1016/j.envint.2021.106624. Epub 2021 May 10.
10
A Recent Overview of Producers and Important Dietary Sources of Aflatoxins.黄曲霉毒素的生产者及重要膳食来源的近期综述
Toxins (Basel). 2021 Mar 3;13(3):186. doi: 10.3390/toxins13030186.