Luo Junyu, Qiao Zhi, Glass Lucas, Xiao Cao, Ma Fenglong
The Pennsylvania State University, University Park, USA.
United Imaging Healthcare, Beijing, China.
Proc ACM Int Conf Inf Knowl Manag. 2023 Oct;2023:5356-5360. doi: 10.1145/3583780.3615113. Epub 2023 Oct 21.
Clinical trials aim to study new tests and evaluate their effects on human health outcomes, which has a huge market size. However, carrying out clinical trials is expensive and time-consuming and often ends in no results. It will revolutionize clinical practice if we can develop an effective model to automatically estimate the status of a clinical trial and find out possible failure reasons. However, it is challenging to develop such a model because of the lack of a benchmark dataset. To address these challenges, in this paper, we first build a new dataset by extracting the publicly available clinical trial reports from ClinicalTrials.gov. The associated status of each report is treated as the status label. To analyze the failure reasons, domain experts help us manually annotate each failed report based on the description associated with it. More importantly, we examine several state-of-the-art text classification baselines on this task and find out that the unique format of the clinical trial protocols plays an essential role in affecting prediction accuracy, demonstrating the need for specially designed clinical trial classification models.
临床试验旨在研究新的测试方法并评估其对人类健康结果的影响,这一领域有着巨大的市场规模。然而,开展临床试验成本高昂且耗时,而且往往毫无结果。如果我们能够开发出一种有效的模型来自动评估临床试验的状态并找出可能的失败原因,将会给临床实践带来变革。然而,由于缺乏基准数据集,开发这样一个模型具有挑战性。为了应对这些挑战,在本文中,我们首先通过从ClinicalTrials.gov中提取公开可用的临床试验报告来构建一个新的数据集。每份报告的相关状态被视为状态标签。为了分析失败原因,领域专家根据与之相关的描述帮助我们手动注释每份失败的报告。更重要的是,我们在此任务上检验了几种最先进的文本分类基线方法,发现临床试验方案的独特格式在影响预测准确性方面起着至关重要的作用,这表明需要专门设计的临床试验分类模型。