de Bruijn Berry, Carini Simona, Kiritchenko Svetlana, Martin Joel, Sim Ida
Institute for Information Technology, National Research Council, Ottawa, Ontario, Canada.
AMIA Annu Symp Proc. 2008 Nov 6;2008:141-5.
Clinical trials are one of the most valuable sources of scientific evidence for improving the practice of medicine. The Trial Bank project aims to improve structured access to trial findings by including formalized trial information into a knowledge base. Manually extracting trial information from published articles is costly, but automated information extraction techniques can assist. The current study highlights a single architecture to extract a wide array of information elements from full-text publications of randomized clinical trials (RCTs). This architecture combines a text classifier with a weak regular expression matcher. We tested this two-stage architecture on 88 RCT reports from 5 leading medical journals, extracting 23 elements of key trial information such as eligibility rules, sample size, intervention, and outcome names. Results prove this to be a promising avenue to help critical appraisers, systematic reviewers, and curators quickly identify key information elements in published RCT articles.
临床试验是改善医学实践的最有价值的科学证据来源之一。试验库项目旨在通过将正式的试验信息纳入知识库来改善对试验结果的结构化访问。从已发表的文章中手动提取试验信息成本高昂,但自动化信息提取技术可以提供帮助。当前的研究突出了一种单一架构,用于从随机临床试验(RCT)的全文出版物中提取大量信息元素。这种架构将文本分类器与弱正则表达式匹配器相结合。我们在来自5种领先医学期刊的88份RCT报告上测试了这种两阶段架构,提取了23个关键试验信息元素,如入选规则、样本量、干预措施和结果名称。结果证明,这是一条有前景的途径,可帮助关键评估者、系统评价者和管理者快速识别已发表RCT文章中的关键信息元素。