• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用自我对照队列研究的基础模型辅助自动高通量药物筛选

Foundational model aided automatic high-throughput drug screening using self-controlled cohort study.

作者信息

Xu Shenbo, Cobzaru Raluca, Finkelstein Stan N, Welsch Roy E, Ng Kenney, Middleton Lefkos

机构信息

Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Cambridge, MA 02142, USA.

出版信息

medRxiv. 2024 Sep 16:2024.08.04.24311480. doi: 10.1101/2024.08.04.24311480.

DOI:10.1101/2024.08.04.24311480
PMID:39148849
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11326319/
Abstract

BACKGROUND

Developing medicine from scratch to governmental authorization and detecting adverse drug reactions (ADR) have barely been economical, expeditious, and risk-averse investments. The availability of large-scale observational healthcare databases and the popularity of large language models offer an unparalleled opportunity to enable automatic high-throughput drug screening for both repurposing and pharmacovigilance.

OBJECTIVES

To demonstrate a general workflow for automatic high-throughput drug screening with the following advantages: (i) the association of various exposure on diseases can be estimated; (ii) both repurposing and pharmacovigilance are integrated; (iii) accurate exposure length for each prescription is parsed from clinical texts; (iv) intrinsic relationship between drugs and diseases are removed jointly by bioinformatic mapping and large language model - ChatGPT; (v) causal-wise interpretations for incidence rate contrasts are provided.

METHODS

Using a self-controlled cohort study design where subjects serve as their own control group, we tested the intention-to-treat association between medications on the incidence of diseases. Exposure length for each prescription is determined by parsing common dosages in English free text into a structured format. Exposure period starts from initial prescription to treatment discontinuation. A same exposure length preceding initial treatment is the control period. Clinical outcomes and categories are identified using existing phenotyping algorithms. Incident rate ratios (IRR) are tested using uniformly most powerful (UMP) unbiased tests.

RESULTS

We assessed 3,444 medications on 276 diseases on 6,613,198 patients from the Clinical Practice Research Datalink (CPRD), an UK primary care electronic health records (EHR) spanning from 1987 to 2018. Due to the built-in selection bias of self-controlled cohort studies, ingredients-disease pairs confounded by deterministic medical relationships are removed by existing map from RxNorm and nonexistent maps by calling ChatGPT. A total of 16,901 drug-disease pairs reveals significant risk reduction, which can be considered as candidates for repurposing, while a total of 11,089 pairs showed significant risk increase, where drug safety might be of a concern instead.

CONCLUSIONS

This work developed a data-driven, nonparametric, hypothesis generating, and automatic high-throughput workflow, which reveals the potential of natural language processing in pharmacoepidemiology. We demonstrate the paradigm to a large observational health dataset to help discover potential novel therapies and adverse drug effects. The framework of this study can be extended to other observational medical databases.

摘要

背景

从头研发药物直至获得政府批准,并检测药物不良反应,这几乎算不上经济、高效且规避风险的投资。大规模观察性医疗保健数据库的可用性以及大语言模型的普及,为实现用于药物再利用和药物警戒的自动高通量药物筛选提供了前所未有的机会。

目的

展示一种用于自动高通量药物筛选的通用工作流程,具有以下优势:(i)可以估计各种暴露与疾病之间的关联;(ii)整合药物再利用和药物警戒;(iii)从临床文本中解析每个处方的准确暴露时长;(iv)通过生物信息映射和大语言模型ChatGPT共同消除药物与疾病之间的内在关系;(v)提供发病率对比的因果解释。

方法

采用自我对照队列研究设计,即受试者作为自身的对照组,我们测试了药物与疾病发病率之间的意向性治疗关联。通过将英文自由文本中的常用剂量解析为结构化格式来确定每个处方的暴露时长。暴露期从初始处方开始至治疗终止。初始治疗前相同的暴露时长为对照期。使用现有的表型分析算法确定临床结局和类别。使用一致最强大(UMP)无偏检验来测试发病率比(IRR)。

结果

我们对来自临床实践研究数据链(CPRD)的6,613,198名患者的276种疾病的3444种药物进行了评估,CPRD是一个涵盖1987年至2018年的英国初级保健电子健康记录(EHR)。由于自我对照队列研究存在固有的选择偏倚,通过RxNorm的现有映射以及调用ChatGPT创建不存在的映射,消除了由确定性医学关系混淆的成分 - 疾病对。总共16,901对药物 - 疾病对显示出显著的风险降低,可被视为药物再利用的候选对象,而总共11,089对显示出显著的风险增加,在这些情况下药物安全性可能更值得关注。

结论

这项工作开发了一种数据驱动、非参数、生成假设的自动高通量工作流程,揭示了自然语言处理在药物流行病学中的潜力。我们向一个大型观察性健康数据集展示了该范式,以帮助发现潜在的新疗法和药物不良反应。本研究的框架可以扩展到其他观察性医学数据库。

相似文献

1
Foundational model aided automatic high-throughput drug screening using self-controlled cohort study.利用自我对照队列研究的基础模型辅助自动高通量药物筛选
medRxiv. 2024 Sep 16:2024.08.04.24311480. doi: 10.1101/2024.08.04.24311480.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
Overview of the epidemiology methods and applications: strengths and limitations of observational study designs.流行病学方法与应用概述:观察性研究设计的优势与局限性。
Crit Rev Food Sci Nutr. 2010;50 Suppl 1(s1):10-2. doi: 10.1080/10408398.2010.526838.
4
A method for cohort selection of cardiovascular disease records from an electronic health record system.一种从电子健康记录系统中选择心血管疾病记录队列的方法。
Int J Med Inform. 2017 Jun;102:138-149. doi: 10.1016/j.ijmedinf.2017.03.015. Epub 2017 Mar 30.
5
From narrative descriptions to MedDRA: automagically encoding adverse drug reactions.从叙述性描述到 MedDRA:自动编码药物不良反应。
J Biomed Inform. 2018 Aug;84:184-199. doi: 10.1016/j.jbi.2018.07.001. Epub 2018 Jul 4.
6
Portable automatic text classification for adverse drug reaction detection via multi-corpus training.通过多语料库训练实现用于药物不良反应检测的便携式自动文本分类
J Biomed Inform. 2015 Feb;53:196-207. doi: 10.1016/j.jbi.2014.11.002. Epub 2014 Nov 8.
7
High-Throughput Computing to Automate Population-Based Studies to Detect the 30-Day Risk of Adverse Outcomes After New Outpatient Medication Use in Older Adults with Chronic Kidney Disease: A Clinical Research Protocol.高通量计算实现基于人群研究的自动化,以检测老年慢性肾脏病患者新门诊用药后30天不良结局风险:一项临床研究方案
Can J Kidney Health Dis. 2024 Jan 6;11:20543581231221891. doi: 10.1177/20543581231221891. eCollection 2024.
8
Association between pacifier use and breast-feeding, sudden infant death syndrome, infection and dental malocclusion.安抚奶嘴使用与母乳喂养、婴儿猝死综合征、感染及牙列不齐之间的关联。
JBI Libr Syst Rev. 2005;3(6):1-33. doi: 10.11124/01938924-200503060-00001.
9
Recovery schools for improving behavioral and academic outcomes among students in recovery from substance use disorders: a systematic review.改善物质使用障碍康复期学生行为和学业成果的康复学校:一项系统综述
Campbell Syst Rev. 2018 Oct 4;14(1):1-86. doi: 10.4073/csr.2018.9. eCollection 2018.
10
Dietary glycation compounds - implications for human health.饮食糖化化合物 - 对人类健康的影响。
Crit Rev Toxicol. 2024 Sep;54(8):485-617. doi: 10.1080/10408444.2024.2362985. Epub 2024 Aug 16.

本文引用的文献

1
Empowering beginners in bioinformatics with ChatGPT.借助ChatGPT助力生物信息学初学者。
Quant Biol. 2023 Jun;11(2):105-108. doi: 10.15302/j-qb-023-0327. Epub 2023 Mar 31.
2
Revealing Unknown Benefits of Existing Medications to Aid the Discovery of New Treatments for Post-Traumatic Stress Disorder.揭示现有药物的未知益处,以助力发现创伤后应激障碍的新疗法。
Psychiatr Res Clin Pract. 2021 Dec 20;4(1):12-20. doi: 10.1176/appi.prcp.20210019. eCollection 2022 Spring.
3
Exploring real-world evidence to uncover unknown drug benefits and support the discovery of new treatment targets for depressive and bipolar disorders.
探索真实世界证据,以发现未知的药物益处,并为抑郁和双相情感障碍的新治疗靶点的发现提供支持。
J Affect Disord. 2021 Jul 1;290:324-333. doi: 10.1016/j.jad.2021.04.096. Epub 2021 May 2.
4
Emulated Clinical Trials from Longitudinal Real-World Data Efficiently Identify Candidates for Neurological Disease Modification: Examples from Parkinson's Disease.基于纵向真实世界数据的模拟临床试验能够有效识别神经疾病修饰治疗的候选对象:帕金森病的实例
Front Pharmacol. 2021 Apr 22;12:631584. doi: 10.3389/fphar.2021.631584. eCollection 2021.
5
Application of Real-World Data and the REWARD Framework to Detect Unknown Benefits of Memantine and Identify Potential Disease Targets for New NMDA Receptor Antagonists.真实世界数据和 REWARD 框架在检测美金刚未知获益和发现新型 NMDA 受体拮抗剂潜在疾病靶点中的应用。
CNS Drugs. 2021 Feb;35(2):243-251. doi: 10.1007/s40263-020-00789-3. Epub 2021 Feb 4.
6
How Confident Are We about Observational Findings in Healthcare: A Benchmark Study.我们对医疗保健领域观察性研究结果的信心有多少:一项基准研究。
Harv Data Sci Rev. 2020;2(1). doi: 10.1162/99608f92.147cc28e. Epub 2020 Jan 31.
7
On the Causal Interpretation of Rate-Change Methods: The Prior Event Rate Ratio and Rate Difference.关于率变化方法的因果解释:前事件率比和率差。
Am J Epidemiol. 2021 Jan 4;190(1):142-149. doi: 10.1093/aje/kwaa122.
8
Development and validation of the Cambridge Multimorbidity Score.剑桥多种疾病评分的制定与验证。
CMAJ. 2020 Feb 3;192(5):E107-E114. doi: 10.1503/cmaj.190757.
9
Aiding the discovery of new treatments for dementia by uncovering unknown benefits of existing medications.通过揭示现有药物的未知益处,助力发现痴呆症的新疗法。
Alzheimers Dement (N Y). 2019 Dec 9;5:862-870. doi: 10.1016/j.trci.2019.07.012. eCollection 2019.
10
A chronological map of 308 physical and mental health conditions from 4 million individuals in the English National Health Service.308 种身心状况的时间图谱,源自英国国民保健署 400 万人的数据。
Lancet Digit Health. 2019 May 20;1(2):e63-e77. doi: 10.1016/S2589-7500(19)30012-3. eCollection 2019 Jun.