Faculty of Medicine and Health, The University of Sydney, Biomedical Informatics and Digital Health, School of Medical Sciences, Sydney, New South Wales, Australia.
Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA.
Res Synth Methods. 2022 May;13(3):342-352. doi: 10.1002/jrsm.1545. Epub 2022 Jan 23.
A substantial proportion of trial registrations are not linked to corresponding published articles, limiting analyses and new tools. Our aim was to develop a method for finding articles reporting the results of trials that are registered on ClinicalTrials.gov when they do not include metadata links. We used a set of 27,280 trial registration and article pairs to train and evaluate methods for identifying missing links in both directions-from articles to registrations and from registrations to articles. We trained a classifier with six distance metrics as feature representations to rank the correct article or registration, using recall@K to evaluate performance and compare to baseline methods. When identifying links from registrations to published articles, the classifier ranked the correct article first (recall@1) among 378,048 articles in 80.8% of evaluation cases and 34.9% in the baseline method. Recall@10 was 85.1% compared to 60.7% in the baseline. When predicting links from articles to registrations, recall@1 was 83.4% for the classifier and 39.8% in the baseline. Recall@10 was 89.5% compared to 65.8% in the baseline. The proposed method improves on our baseline document similarity method to be feasible for identifying missing links in practice. Given a ClinicalTrials.gov registration, a user checking 10 ranked articles can expect to identify the matching article in at least 85% of cases, if the trial has been published. The proposed method can be used to improve the coupling of ClinicalTrials.gov and PubMed, with applications related to automating systematic review and evidence synthesis processes.
相当一部分试验注册与相应的已发表文章没有关联,限制了分析和新工具的使用。我们的目的是开发一种方法,用于查找在 ClinicalTrials.gov 上注册但未包含元数据链接的试验报告结果的文章。我们使用了一组 27280 个试验注册和文章对,来训练和评估在两个方向上(从文章到注册和从注册到文章)识别缺失链接的方法。我们使用了六个距离度量标准作为特征表示来训练一个分类器,以召回率@K 来评估性能,并与基线方法进行比较。当从注册到已发表文章识别链接时,在 80.8%的评估案例中,分类器在 378048 篇文章中排名第一(召回率@1),而在基线方法中则为 34.9%。召回率@10 为 85.1%,而基线方法为 60.7%。当从文章预测到注册时,分类器的召回率@1 为 83.4%,而基线方法为 39.8%。召回率@10 为 89.5%,而基线方法为 65.8%。与我们的基线文档相似性方法相比,所提出的方法可提高识别实践中缺失链接的可行性。给定 ClinicalTrials.gov 注册,用户检查 10 篇排名靠前的文章,如果试验已经发表,那么可以期望在至少 85%的情况下识别出匹配的文章。该方法可用于改善 ClinicalTrials.gov 和 PubMed 的耦合,应用于自动化系统评价和证据综合过程。