School of Population Health and Environmental Sciences, King's College London, London, United Kingdom.
Khoury College of Computer Sciences, Northeastern University, Boston, Massachusetts, USA.
J Am Med Inform Assoc. 2020 Dec 9;27(12):1903-1912. doi: 10.1093/jamia/ocaa163.
Randomized controlled trials (RCTs) are the gold standard method for evaluating whether a treatment works in health care but can be difficult to find and make use of. We describe the development and evaluation of a system to automatically find and categorize all new RCT reports.
Trialstreamer continuously monitors PubMed and the World Health Organization International Clinical Trials Registry Platform, looking for new RCTs in humans using a validated classifier. We combine machine learning and rule-based methods to extract information from the RCT abstracts, including free-text descriptions of trial PICO (populations, interventions/comparators, and outcomes) elements and map these snippets to normalized MeSH (Medical Subject Headings) vocabulary terms. We additionally identify sample sizes, predict the risk of bias, and extract text conveying key findings. We store all extracted data in a database, which we make freely available for download, and via a search portal, which allows users to enter structured clinical queries. Results are ranked automatically to prioritize larger and higher-quality studies.
As of early June 2020, we have indexed 673 191 publications of RCTs, of which 22 363 were published in the first 5 months of 2020 (142 per day). We additionally include 304 111 trial registrations from the International Clinical Trials Registry Platform. The median trial sample size was 66.
We present an automated system for finding and categorizing RCTs. This yields a novel resource: a database of structured information automatically extracted for all published RCTs in humans. We make daily updates of this database available on our website (https://trialstreamer.robotreviewer.net).
随机对照试验(RCT)是评估医疗保健中某种治疗方法是否有效的金标准方法,但可能难以找到和使用。我们描述了一种自动发现和分类所有新 RCT 报告的系统的开发和评估。
Trialstreamer 持续监测 PubMed 和世界卫生组织国际临床试验注册平台,使用经过验证的分类器寻找新的人类 RCT。我们结合机器学习和基于规则的方法从 RCT 摘要中提取信息,包括试验 PICO(人群、干预措施/比较和结局)元素的自由文本描述,并将这些片段映射到标准化的 MeSH(医学主题词)词汇术语。我们还确定样本量、预测偏倚风险,并提取传达关键发现的文本。我们将所有提取的数据存储在一个数据库中,该数据库可供免费下载,并通过一个搜索门户提供,用户可以通过该门户输入结构化的临床查询。结果自动排名,以优先考虑更大和更高质量的研究。
截至 2020 年 6 月初,我们已经索引了 673 191 篇 RCT 出版物,其中 2020 年前 5 个月发表了 22 363 篇(每天 142 篇)。我们还包括来自国际临床试验注册平台的 304 111 项试验注册。试验样本量中位数为 66。
我们提出了一种用于发现和分类 RCT 的自动化系统。这产生了一个新的资源:一个自动为所有已发表的人类 RCT 提取结构化信息的数据库。我们每天在我们的网站(https://trialstreamer.robotreviewer.net)上更新此数据库。