• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大海捞针:在药物专利检索与预测中利用人工智能

Needle in a haystack: Harnessing AI in drug patent searches and prediction.

作者信息

Ribeiro Leonardo Costa, Muzaka Valbona

机构信息

Departamento de Ciências Econômicas, Faculdade de Ciências Econômicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brasil.

Economic-History Department, Uppsala University, Uppsala, Sweden.

出版信息

PLoS One. 2024 Dec 2;19(12):e0311238. doi: 10.1371/journal.pone.0311238. eCollection 2024.

DOI:10.1371/journal.pone.0311238
PMID:39621674
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11611211/
Abstract

The classification codes granted by patent offices are useful instruments for simplifying the bewildering variety of patents in existence. They are singularly unhelpful, however, in locating a specific subgroup of patents such as that of drug-related pharmaceutical patents for which no classification codes exist. Taking advantage of advances in artificial intelligence and in natural language processing in particular, we offer a new method of identifying chemical drug-related patents in this article. The aim is primarily that of demonstrating how the proverbial needle in a haystack was identified, namely through leveraging the superb pattern-recognition abilities of the BERT (Bidirectional Encoder Representations from Transformers) algorithm. We build three different databases to train our algorithm and fine-tune its abilities to identify the patent group in question by exposing it to additional texts containing structures that are much more likely to be present in them, until we obtain the highest possible F1-score, combined with an accuracy of 94.40%. We also demonstrate some possible uses of the algorithm. Its application to the US patent office database enables the identification of potential chemical drug patents up to ten years before drug approval, whereas its application to the German patent office reveals the regional nature of drug R&D and patenting strategies. The hope is that both the method proposed and its applications will be further refined and expanded forthwith.

摘要

专利局授予的分类代码是简化现有令人眼花缭乱的各种专利的有用工具。然而,在查找特定的专利子类别时,它们却毫无帮助,比如查找不存在分类代码的与药物相关的制药专利。利用人工智能尤其是自然语言处理方面的进展,我们在本文中提供了一种识别与化学药物相关专利的新方法。其主要目的是展示如何找到那根 proverbial needle in a haystack(大海捞针),即通过利用BERT(Bidirectional Encoder Representations from Transformers,来自变换器的双向编码器表示)算法卓越的模式识别能力。我们构建了三个不同的数据库来训练我们的算法,并通过让其接触更多包含更可能出现的结构的文本,对其识别相关专利组的能力进行微调,直到我们获得尽可能高的F1分数,同时准确率达到94.40%。我们还展示了该算法的一些可能用途。将其应用于美国专利局数据库能够在药物获批前十年识别潜在的化学药物专利,而将其应用于德国专利局则揭示了药物研发和专利策略的区域性质。希望本文提出的方法及其应用能立即得到进一步完善和扩展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b09/11611211/18d162519241/pone.0311238.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b09/11611211/5bb2c180bdf1/pone.0311238.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b09/11611211/5291a7e1c3d2/pone.0311238.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b09/11611211/ce1cd63d451a/pone.0311238.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b09/11611211/b59610c335bd/pone.0311238.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b09/11611211/1c16de154587/pone.0311238.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b09/11611211/18d162519241/pone.0311238.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b09/11611211/5bb2c180bdf1/pone.0311238.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b09/11611211/5291a7e1c3d2/pone.0311238.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b09/11611211/ce1cd63d451a/pone.0311238.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b09/11611211/b59610c335bd/pone.0311238.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b09/11611211/1c16de154587/pone.0311238.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b09/11611211/18d162519241/pone.0311238.g006.jpg

相似文献

1
Needle in a haystack: Harnessing AI in drug patent searches and prediction.大海捞针:在药物专利检索与预测中利用人工智能
PLoS One. 2024 Dec 2;19(12):e0311238. doi: 10.1371/journal.pone.0311238. eCollection 2024.
2
Multifaceted Natural Language Processing Task-Based Evaluation of Bidirectional Encoder Representations From Transformers Models for Bilingual (Korean and English) Clinical Notes: Algorithm Development and Validation.基于转换器模型的双向编码器表示的多方面自然语言处理任务评估在双语(韩语和英语)临床笔记中的应用:算法开发和验证。
JMIR Med Inform. 2024 Oct 30;12:e52897. doi: 10.2196/52897.
3
Evolution of AI enabled healthcare systems using textual data with a pretrained BERT deep learning model.使用预训练的BERT深度学习模型的文本数据实现人工智能驱动的医疗保健系统的演进。
Sci Rep. 2025 Mar 4;15(1):7540. doi: 10.1038/s41598-025-91622-8.
4
Patent value prediction in biomedical textiles: A method based on a fusion of machine learning models.生物医学纺织品中的专利价值预测:一种基于机器学习模型融合的方法。
PLoS One. 2025 Apr 24;20(4):e0322182. doi: 10.1371/journal.pone.0322182. eCollection 2025.
5
Herbal drug patenting in India: IP potential.印度草药药品专利:知识产权潜力。
J Ethnopharmacol. 2011 Sep 1;137(1):289-97. doi: 10.1016/j.jep.2011.05.022. Epub 2011 May 27.
6
Fine-Tuned Bidirectional Encoder Representations From Transformers Versus ChatGPT for Text-Based Outpatient Department Recommendation: Comparative Study.微调的基于转换器的双向编码器表示与 ChatGPT 用于基于文本的门诊推荐:比较研究。
JMIR Form Res. 2024 Oct 18;8:e47814. doi: 10.2196/47814.
7
Patent Portfolios Protecting 10 Top-Selling Prescription Drugs.专利组合保护十大畅销处方药。
JAMA Intern Med. 2024 Jul 1;184(7):810-817. doi: 10.1001/jamainternmed.2024.0836.
8
Developing an ICD-10 Coding Assistant: Pilot Study Using RoBERTa and GPT-4 for Term Extraction and Description-Based Code Selection.开发国际疾病分类第十版(ICD - 10)编码助手:使用RoBERTa和GPT - 4进行术语提取和基于描述的代码选择的试点研究
JMIR Form Res. 2025 Feb 11;9:e60095. doi: 10.2196/60095.
9
Chemical named entity recognition in patents by domain knowledge and unsupervised feature learning.基于领域知识和无监督特征学习的专利中化学命名实体识别
Database (Oxford). 2016 Apr 17;2016. doi: 10.1093/database/baw049. Print 2016.
10
A Natural Language Processing Model for COVID-19 Detection Based on Dutch General Practice Electronic Health Records by Using Bidirectional Encoder Representations From Transformers: Development and Validation Study.基于荷兰全科电子健康记录的 COVID-19 检测自然语言处理模型:使用转换器的双向编码器表示进行开发和验证研究。
J Med Internet Res. 2023 Oct 4;25:e49944. doi: 10.2196/49944.

本文引用的文献

1
Distinguishing and predicting drug patents.区分和预测药品专利。
Nat Biotechnol. 2023 Mar;41(3):317-321. doi: 10.1038/s41587-023-01703-0.
2
On the effectiveness of compact biomedical transformers.紧凑型生物医学变压器的有效性。
Bioinformatics. 2023 Mar 1;39(3). doi: 10.1093/bioinformatics/btad103.
3
The tangled history of mRNA vaccines.信使核糖核酸疫苗的复杂历史。
Nature. 2021 Sep;597(7876):318-324. doi: 10.1038/d41586-021-02483-w.
4
Information Opacity in Biopharmaceutical Innovation Through the Lens of COVID-19.从 COVID-19 的角度看生物制药创新中的信息不透明性。
Am J Law Med. 2021 Jul;47(2-3):157-175. doi: 10.1017/amj.2021.13.
5
FakeBERT: Fake news detection in social media with a BERT-based deep learning approach.FakeBERT:基于BERT的深度学习方法用于社交媒体中的假新闻检测
Multimed Tools Appl. 2021;80(8):11765-11788. doi: 10.1007/s11042-020-10183-2. Epub 2021 Jan 7.
6
The Nooscope manifested: AI as instrument of knowledge extractivism.思维镜显示:人工智能作为知识榨取主义的工具。
AI Soc. 2021;36(4):1263-1280. doi: 10.1007/s00146-020-01097-6. Epub 2020 Nov 21.
7
Extracting comprehensive clinical information for breast cancer using deep learning methods.利用深度学习方法提取乳腺癌全面临床信息。
Int J Med Inform. 2019 Dec;132:103985. doi: 10.1016/j.ijmedinf.2019.103985. Epub 2019 Oct 2.
8
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
9
May your drug price be evergreen.愿你的药价永葆“青春”。 (注:这里evergreen本意为常绿的,这里意译为永葆“青春”,结合语境推测是一种调侃药价一直居高不下的说法 )
J Law Biosci. 2018 Dec 7;5(3):590-647. doi: 10.1093/jlb/lsy022. eCollection 2018 Dec.
10
Estimation of clinical trial success rates and related parameters.临床试验成功率及相关参数的估计。
Biostatistics. 2019 Apr 1;20(2):273-286. doi: 10.1093/biostatistics/kxx069.