National Center for Text Mining & School of Computer Science, The University of Manchester, Manchester, M1 7DN, UK.
BMC Med Inform Decis Mak. 2013;13 Suppl 1(Suppl 1):S6. doi: 10.1186/1472-6947-13-S1-S6. Epub 2013 Apr 5.
We consider the user task of designing clinical trial protocols and propose a method that discovers and outputs the most appropriate eligibility criteria from a potentially huge set of candidates. Each document d in our collection D is a clinical trial protocol which itself contains a set of eligibility criteria. Given a small set of sample documentsD',|D'|≪|D|, a user has initially identified as relevant e.g., via a user query interface, our scoring method automatically suggests eligibility criteria from D, D ⊃ D', by ranking them according to how appropriate they are to the clinical trial protocol currently being designed. The appropriateness is measured by the degree to which they are consistent with the user-supplied sample documents D'.
We propose a novel three-step method called LDALR which views documents as a mixture of latent topics. First, we infer the latent topics in the sample documents using Latent Dirichlet Allocation (LDA). Next, we use logistic regression models to compute the probability that a given candidate criterion belongs to a particular topic. Lastly, we score each criterion by computing its expected value, the probability-weighted sum of the topic proportions inferred from the set of sample documents. Intuitively, the greater the probability that a candidate criterion belongs to the topics that are dominant in the samples, the higher its expected value or score.
Our experiments have shown that LDALR is 8 and 9 times better (resp., for inclusion and exclusion criteria) than randomly choosing from a set of candidates obtained from relevant documents. In user simulation experiments using LDALR, we were able to automatically construct eligibility criteria that are on the average 75% and 70% (resp., for inclusion and exclusion criteria) similar to the correct eligibility criteria.
We have proposed LDALR, a practical method for discovering and inferring appropriate eligibility criteria in clinical trial protocols without labeled data. Results from our experiments suggest that LDALR models can be used to effectively find appropriate eligibility criteria from a large repository of clinical trial protocols.
我们考虑设计临床试验方案的用户任务,并提出了一种从潜在的大量候选者中发现和输出最合适的纳入排除标准的方法。我们的集合 D 中的每个文档 d 都是一份临床试验方案,其中包含一组纳入排除标准。给定一小部分样本文档 D'(|D'|≪|D|),用户通过用户查询界面将其最初识别为相关文档,我们的评分方法会自动根据它们与当前正在设计的临床试验方案的一致性程度,从 D(D ⊃ D')中为其推荐纳入排除标准。这种一致性程度是通过它们与用户提供的样本文档 D'的一致性程度来衡量的。
我们提出了一种称为 LDALR 的新颖三步法,它将文档视为潜在主题的混合物。首先,我们使用潜在狄利克雷分配(LDA)来推断样本文档中的潜在主题。接下来,我们使用逻辑回归模型来计算给定候选标准属于特定主题的概率。最后,我们通过计算其期望值来为每个标准评分,该期望值是从样本文档集合中推断出的主题比例的概率加权和。直观地说,候选标准属于样本中占主导地位的主题的概率越高,其期望值或得分就越高。
我们的实验表明,LDALR 在纳入和排除标准方面,分别比随机从相关文档中获得的候选者集合中选择的方法好 8 倍和 9 倍。在使用 LDALR 的用户模拟实验中,我们能够自动构建平均有 75%和 70%(分别为纳入和排除标准)与正确纳入排除标准相似的纳入排除标准。
我们提出了 LDALR,这是一种在没有标记数据的情况下从大量临床试验方案中发现和推断合适纳入排除标准的实用方法。实验结果表明,LDALR 模型可用于从大型临床试验方案存储库中有效地找到合适的纳入排除标准。