Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia.
Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia.
J Biomed Inform. 2018 Mar;79:32-40. doi: 10.1016/j.jbi.2018.01.008. Epub 2018 Feb 2.
Clinical trial registries can be used to monitor the production of trial evidence and signal when systematic reviews become out of date. However, this use has been limited to date due to the extensive manual review required to search for and screen relevant trial registrations. Our aim was to evaluate a new method that could partially automate the identification of trial registrations that may be relevant for systematic review updates.
We identified 179 systematic reviews of drug interventions for type 2 diabetes, which included 537 clinical trials that had registrations in ClinicalTrials.gov. Text from the trial registrations were used as features directly, or transformed using Latent Dirichlet Allocation (LDA) or Principal Component Analysis (PCA). We tested a novel matrix factorisation approach that uses a shared latent space to learn how to rank relevant trial registrations for each systematic review, comparing the performance to document similarity to rank relevant trial registrations. The two approaches were tested on a holdout set of the newest trials from the set of type 2 diabetes systematic reviews and an unseen set of 141 clinical trial registrations from 17 updated systematic reviews published in the Cochrane Database of Systematic Reviews. The performance was measured by the number of relevant registrations found after examining 100 candidates (recall@100) and the median rank of relevant registrations in the ranked candidate lists.
The matrix factorisation approach outperformed the document similarity approach with a median rank of 59 (of 128,392 candidate registrations in ClinicalTrials.gov) and recall@100 of 60.9% using LDA feature representation, compared to a median rank of 138 and recall@100 of 42.8% in the document similarity baseline. In the second set of systematic reviews and their updates, the highest performing approach used document similarity and gave a median rank of 67 (recall@100 of 62.9%).
A shared latent space matrix factorisation method was useful for ranking trial registrations to reduce the manual workload associated with finding relevant trials for systematic review updates. The results suggest that the approach could be used as part of a semi-automated pipeline for monitoring potentially new evidence for inclusion in a review update.
临床试验注册可以用于监测试验证据的产生,并在系统评价过时时发出信号。然而,由于需要进行广泛的手动审查来搜索和筛选相关的试验注册,这种用途迄今为止受到限制。我们的目的是评估一种新方法,该方法可以部分自动化识别可能与系统评价更新相关的试验注册。
我们确定了 179 项 2 型糖尿病药物干预的系统评价,其中包括 537 项在 ClinicalTrials.gov 中有注册的临床试验。直接使用试验注册的文本作为特征,或使用潜在狄利克雷分配(LDA)或主成分分析(PCA)进行转换。我们测试了一种新颖的矩阵分解方法,该方法使用共享的潜在空间来学习如何为每个系统评价对相关试验注册进行排名,将性能与文档相似度进行比较,以对相关试验注册进行排名。在 2 型糖尿病系统评价的最新试验集中的一个保留集以及在 Cochrane 系统评价数据库中发布的 17 项更新系统评价的 141 项临床试验注册的未见过的集中,对这两种方法进行了测试。通过检查 100 个候选者(recall@100)来衡量性能,并衡量在排名候选者列表中相关注册的中位数排名。
使用 LDA 特征表示,矩阵分解方法的表现优于文档相似度方法,其中位数排名为 59(在 ClinicalTrials.gov 中的 128392 个候选注册中),召回率@100 为 60.9%,而文档相似度基线的中位数排名为 138,召回率@100 为 42.8%。在第二组系统评价及其更新中,表现最佳的方法是使用文档相似度,中位数排名为 67(召回率@100 为 62.9%)。
共享潜在空间矩阵分解方法对于对试验注册进行排名以减少与寻找系统评价更新相关的试验相关的手动工作量很有用。结果表明,该方法可用于作为监测可能包含在审查更新中的新证据的半自动管道的一部分。