Holt Arthur M, Troy Ang Michael, Smalheiser Neil R
Department of Psychiatry, University of Illinois College of Medicine, Chicago, IL, 60612, USA.
Trials. 2025 Jan 31;26(1):34. doi: 10.1186/s13063-025-08741-w.
Linking registered clinical trials with their published results continues to be a challenge. A variety of natural language processing (NLP)-based and machine learning-based models have been developed to assist users in identifying these connections. To date, however, no system has attempted to detect mentions of registry numbers within the full-text of articles.
Articles from the PubMed Central full-text Open Access dataset were scanned for mentions of ClinicalTrials.gov and international clinical trial registry identifiers. We analyzed the distribution of trial registry numbers within sections of the articles and characterized their publication type indexing and other metrics.
Registry numbers mentioned in article metadata (e.g., the abstract) or in the Methods section of full-text are highly predictive of clinical trial articles. When a clinical trial article mentioned ClinicalTrials.gov identifier numbers (NCT) only in the Methods section, in every case examined, it was reporting clinical outcomes from that registered trial, and thus can reliably be used to link that trial to that publication. Conversely, registry numbers mentioned in Tables arise almost entirely from reviews (including systematic reviews and meta-analyses). Registry numbers mentioned in other full-text sections have relatively little predictive value for linking trials to their publications. Clinical trial articles that mention CONSORT or SPIRIT guidelines have a higher rate of mentioning registry numbers in article metadata, and hence are more easily linked to their underlying trials, than articles overall.
The appearance and location of trial registry numbers within the full-text of biomedical articles provide valuable features for connecting clinical trials to their publications. They also potentially provide information to assist automated tools in assigning publication types to articles.
将注册的临床试验与其发表的结果相联系仍然是一项挑战。已经开发了各种基于自然语言处理(NLP)和机器学习的模型来帮助用户识别这些联系。然而,迄今为止,还没有系统尝试在文章的全文中检测注册号的提及。
对来自PubMed Central全文开放获取数据集的文章进行扫描,以查找ClinicalTrials.gov和国际临床试验注册标识符的提及。我们分析了文章各部分中试验注册号的分布,并对其出版类型索引和其他指标进行了特征描述。
文章元数据(如摘要)或全文的方法部分中提及的注册号对临床试验文章具有高度预测性。当一篇临床试验文章仅在方法部分提及ClinicalTrials.gov标识符编号(NCT)时,在所检查的每一个案例中,它都在报告该注册试验的临床结果,因此可以可靠地用于将该试验与该出版物联系起来。相反,表格中提及的注册号几乎完全来自综述(包括系统综述和荟萃分析)。全文其他部分提及的注册号对于将试验与其出版物联系起来的预测价值相对较小。与总体文章相比,提及CONSORT或SPIRIT指南的临床试验文章在文章元数据中提及注册号的比例更高,因此更容易与其基础试验相联系。
生物医学文章全文中试验注册号的出现和位置为将临床试验与其出版物相联系提供了有价值的特征。它们还可能提供信息,以协助自动化工具为文章分配出版类型。