Department of Computer Science, The University of Iowa, 14 MacLean Hall, Iowa City, IA 52242, United States.
J Biomed Inform. 2013 Oct;46(5):805-13. doi: 10.1016/j.jbi.2013.06.001. Epub 2013 Jun 12.
Previous research on standardization of eligibility criteria and its feasibility has traditionally been conducted on clinical trial protocols from ClinicalTrials.gov (CT). The portability and use of such standardization for full-text industry-standard protocols has not been studied in-depth. Towards this end, in this study we first compare the representation characteristics and textual complexity of a set of Pfizer's internal full-text protocols to their corresponding entries in CT. Next, we identify clusters of similar criteria sentences from both full-text and CT protocols and outline methods for standardized representation of eligibility criteria. We also study the distribution of eligibility criteria in full-text and CT protocols with respect to pre-defined semantic classes used for eligibility criteria classification. We find that in comparison to full-text protocols, CT protocols are not only more condensed but also convey less information. We also find no correlation between the variations in word-counts of the ClinicalTrials.gov and full-text protocols. While we identify 65 and 103 clusters of inclusion and exclusion criteria from full text protocols, our methods found only 36 and 63 corresponding clusters from CT protocols. For both the full-text and CT protocols we are able to identify 'templates' for standardized representations with full-text standardization being more challenging of the two. In our exploration of the semantic class distributions we find that the majority of the inclusion criteria from both full-text and CT protocols belong to the semantic class "Diagnostic and Lab Results" while "Disease, Sign or Symptom" forms the majority for exclusion criteria. Overall, we show that developing a template set of eligibility criteria for clinical trials, specifically in their full-text form, is feasible and could lead to more efficient clinical trial protocol design.
先前关于资格标准规范化及其可行性的研究传统上是在 ClinicalTrials.gov (CT) 的临床试验方案上进行的。尚未深入研究这种标准化在全文行业标准方案中的可移植性和使用。为此,在本研究中,我们首先比较了一组 Pfizer 内部全文方案与其在 CT 中的对应条目之间的表示特征和文本复杂性。接下来,我们从全文和 CT 方案中识别出相似标准句子的集群,并概述了资格标准的标准化表示方法。我们还研究了资格标准在全文和 CT 方案中的分布情况,这些方案与用于资格标准分类的预定义语义类有关。我们发现,与全文方案相比,CT 方案不仅更简洁,而且传达的信息量也更少。我们还发现 CT 方案和全文方案的字数变化之间没有相关性。虽然我们从全文方案中识别出了 65 个和 103 个包含和排除标准的集群,但我们的方法只从 CT 方案中找到了 36 个和 63 个相应的集群。对于全文和 CT 方案,我们都能够为标准化表示确定“模板”,而全文标准化更为困难。在我们对语义类分布的探索中,我们发现来自全文和 CT 方案的大多数纳入标准都属于“诊断和实验室结果”这一语义类,而“疾病、体征或症状”则是排除标准的主要类别。总的来说,我们表明,为临床试验,特别是其全文形式,开发资格标准模板集是可行的,并且可以提高临床试验方案设计的效率。