临床风险：一个用于预测试验状态和失败原因的与治疗相关的新临床试验数据集。

ClinicalRisk: A New Therapy-related Clinical Trial Dataset for Predicting Trial Status and Failure Reasons.

作者信息

Luo Junyu, Qiao Zhi, Glass Lucas, Xiao Cao, Ma Fenglong

机构信息

The Pennsylvania State University, University Park, USA.

United Imaging Healthcare, Beijing, China.

出版信息

Proc ACM Int Conf Inf Knowl Manag. 2023 Oct;2023:5356-5360. doi: 10.1145/3583780.3615113. Epub 2023 Oct 21.

DOI:10.1145/3583780.3615113

PMID:38601744

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11005852/

Abstract

Clinical trials aim to study new tests and evaluate their effects on human health outcomes, which has a huge market size. However, carrying out clinical trials is expensive and time-consuming and often ends in no results. It will revolutionize clinical practice if we can develop an effective model to automatically estimate the status of a clinical trial and find out possible failure reasons. However, it is challenging to develop such a model because of the lack of a benchmark dataset. To address these challenges, in this paper, we first build a new dataset by extracting the publicly available clinical trial reports from ClinicalTrials.gov. The associated status of each report is treated as the status label. To analyze the failure reasons, domain experts help us manually annotate each failed report based on the description associated with it. More importantly, we examine several state-of-the-art text classification baselines on this task and find out that the unique format of the clinical trial protocols plays an essential role in affecting prediction accuracy, demonstrating the need for specially designed clinical trial classification models.

摘要

临床试验旨在研究新的测试方法并评估其对人类健康结果的影响，这一领域有着巨大的市场规模。然而，开展临床试验成本高昂且耗时，而且往往毫无结果。如果我们能够开发出一种有效的模型来自动评估临床试验的状态并找出可能的失败原因，将会给临床实践带来变革。然而，由于缺乏基准数据集，开发这样一个模型具有挑战性。为了应对这些挑战，在本文中，我们首先通过从ClinicalTrials.gov中提取公开可用的临床试验报告来构建一个新的数据集。每份报告的相关状态被视为状态标签。为了分析失败原因，领域专家根据与之相关的描述帮助我们手动注释每份失败的报告。更重要的是，我们在此任务上检验了几种最先进的文本分类基线方法，发现临床试验方案的独特格式在影响预测准确性方面起着至关重要的作用，这表明需要专门设计的临床试验分类模型。

相似文献

ClinicalRisk: A New Therapy-related Clinical Trial Dataset for Predicting Trial Status and Failure Reasons.临床风险：一个用于预测试验状态和失败原因的与治疗相关的新临床试验数据集。

Proc ACM Int Conf Inf Knowl Manag. 2023 Oct;2023:5356-5360. doi: 10.1145/3583780.3615113. Epub 2023 Oct 21.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

The future of Cochrane Neonatal.考克兰新生儿协作网的未来。

Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.

ChromosomeNet: A massive dataset enabling benchmarking and building basedlines of clinical chromosome classification.ChromosomeNet：一个大规模数据集，可用于临床染色体分类的基准测试和构建基线。

Comput Biol Chem. 2022 Oct;100:107731. doi: 10.1016/j.compbiolchem.2022.107731. Epub 2022 Jul 16.

Predicting Publication of Clinical Trials Using Structured and Unstructured Data: Model Development and Validation Study.利用结构化和非结构化数据预测临床试验发表：模型开发和验证研究。

J Med Internet Res. 2022 Dec 23;24(12):e38859. doi: 10.2196/38859.

A Phase I/II Clinical Trial to evaluate the efficacy of baricitinib to prevent respiratory insufficiency progression in onco-hematological patients affected with COVID19: A structured summary of a study protocol for a randomised controlled trial.一项评估巴瑞替尼预防 COVID19 相关血液肿瘤患者呼吸功能不全进展的疗效的 I/II 期临床试验：一项随机对照试验研究方案的结构化总结。

Trials. 2021 Feb 5;22(1):116. doi: 10.1186/s13063-021-05072-4.

Understanding Clinical Trial Reports: Extracting Medical Entities and Their Relations.理解临床试验报告：提取医学实体及其关系。

AMIA Jt Summits Transl Sci Proc. 2021 May 17;2021:485-494. eCollection 2021.

Estimating the rate and reasons of clinical trial failure in urologic oncology.估算泌尿肿瘤学临床试验失败的速率和原因。

Urol Oncol. 2021 Mar;39(3):154-160. doi: 10.1016/j.urolonc.2020.10.070. Epub 2020 Nov 27.

Plasma exchange and glucocorticoids to delay death or end-stage renal disease in anti-neutrophil cytoplasm antibody-associated vasculitis: PEXIVAS non-inferiority factorial RCT.血浆置换和糖皮质激素治疗抗中性粒细胞胞质抗体相关性血管炎：PEXIVAS 非劣效性析因 RCT。

Health Technol Assess. 2022 Sep;26(38):1-60. doi: 10.3310/PNXB5040.

The effect of framing and communicating COVID-19 vaccine side-effect risks on vaccine intentions for adults in the UK and the USA: A structured summary of a study protocol for a randomized controlled trial.在英国和美国，针对成年人的 COVID-19 疫苗副作用风险的描述和沟通对疫苗接种意愿的影响：一项随机对照试验研究方案的结构化总结。

Trials. 2021 Sep 6;22(1):592. doi: 10.1186/s13063-021-05484-2.

引用本文的文献

A scoping review of artificial intelligence applications in clinical trial risk assessment.人工智能在临床试验风险评估中的应用范围综述。

NPJ Digit Med. 2025 Jul 30;8(1):486. doi: 10.1038/s41746-025-01886-7.

本文引用的文献

HINT: Hierarchical interaction network for clinical-trial-outcome predictions.提示：用于临床试验结果预测的分层交互网络。

Patterns (N Y). 2022 Feb 3;3(4):100445. doi: 10.1016/j.patter.2022.100445. eCollection 2022 Apr 8.

ICD Coding from Clinical Text Using Multi-Filter Residual Convolutional Neural Network.使用多滤波器残差卷积神经网络从临床文本中进行ICD编码

Proc AAAI Conf Artif Intell. 2020 Feb;34(5):8180-8187. doi: 10.1609/aaai.v34i05.6331. Epub 2020 Apr 3.

Criteria2Query: a natural language interface to clinical databases for cohort definition.Criteria2Query：一种用于定义队列的临床数据库自然语言接口。

J Am Med Inform Assoc. 2019 Apr 1;26(4):294-305. doi: 10.1093/jamia/ocy178.

Conducting clinical trials-costs, impacts, and the value of clinical trials networks: A scoping review.开展临床试验的成本、影响及临床试验网络的价值：一项范围综述

Clin Trials. 2019 Apr;16(2):183-193. doi: 10.1177/1740774518820060. Epub 2019 Jan 10.

Estimation of clinical trial success rates and related parameters.临床试验成功率及相关参数的估计。

Biostatistics. 2019 Apr 1;20(2):273-286. doi: 10.1093/biostatistics/kxx069.

EliIE: An open-source information extraction system for clinical trial eligibility criteria.EliIE：一个用于临床试验资格标准的开源信息提取系统。

J Am Med Inform Assoc. 2017 Nov 1;24(6):1062-1071. doi: 10.1093/jamia/ocx019.

EliXR: an approach to eligibility criteria extraction and representation.EliXR：一种资格标准提取和表示方法。

J Am Med Inform Assoc. 2011 Dec;18 Suppl 1(Suppl 1):i116-24. doi: 10.1136/amiajnl-2011-000321. Epub 2011 Jul 31.

Long short-term memory.长短期记忆

Neural Comput. 1997 Nov 15;9(8):1735-80. doi: 10.1162/neco.1997.9.8.1735.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验