Boland M R, Miotto R, Gao J, Weng C
Chunhua Weng, PhD, Florence Irving Assistant Professor, Department of Biomedical Informatics, Columbia University, 622 W 168th Street, VC-5 New York, NY 10032 USA, E-mail:
Methods Inf Med. 2013;52(5):382-94. doi: 10.3414/ME12-01-0092. Epub 2013 May 13.
When standard therapies fail, clinical trials provide experimental treatment opportunities for patients with drug-resistant illnesses or terminal diseases. Clinical Trials can also provide free treatment and education for individuals who otherwise may not have access to such care. To find relevant clinical trials, patients often search online; however, they often encounter a significant barrier due to the large number of trials and in-effective indexing methods for reducing the trial search space.
This study explores the feasibility of feature-based indexing, clustering, and search of clinical trials and informs designs to automate these processes.
We decomposed 80 randomly selected stage III breast cancer clinical trials into a vector of eligibility features, which were organized into a hierarchy. We clustered trials based on their eligibility feature similarities. In a simulated search process, manually selected features were used to generate specific eligibility questions to filter trials iteratively.
We extracted 1,437 distinct eligibility features and achieved an inter-rater agreement of 0.73 for feature extraction for 37 frequent features occurring in more than 20 trials. Using all the 1,437 features we stratified the 80 trials into six clusters containing trials recruiting similar patients by patient-characteristic features, five clusters by disease-characteristic features, and two clusters by mixed features. Most of the features were mapped to one or more Unified Medical Language System (UMLS) concepts, demonstrating the utility of named entity recognition prior to mapping with the UMLS for automatic feature extraction.
It is feasible to develop feature-based indexing and clustering methods for clinical trials to identify trials with similar target populations and to improve trial search efficiency.
当标准疗法失败时,临床试验为患有耐药性疾病或绝症的患者提供了实验性治疗机会。临床试验还可以为那些原本无法获得此类治疗的个人提供免费治疗和教育。为了找到相关的临床试验,患者通常会在网上搜索;然而,由于试验数量众多以及用于减少试验搜索空间的索引方法无效,他们经常遇到重大障碍。
本研究探讨基于特征的索引、聚类和搜索临床试验的可行性,并为自动化这些过程的设计提供信息。
我们将80个随机选择的III期乳腺癌临床试验分解为资格特征向量,并将其组织成一个层次结构。我们根据试验的资格特征相似性对试验进行聚类。在模拟搜索过程中,使用手动选择的特征生成特定的资格问题,以迭代方式筛选试验。
我们提取了1437个不同的资格特征,对于在20多个试验中出现的37个常见特征的特征提取,评分者间一致性达到0.73。使用所有1437个特征,我们将80个试验分为六个聚类,这些聚类按患者特征招募相似患者的试验,按疾病特征分为五个聚类,按混合特征分为两个聚类。大多数特征被映射到一个或多个统一医学语言系统(UMLS)概念,这表明在与UMLS映射以进行自动特征提取之前,命名实体识别的实用性。
开发基于特征的临床试验索引和聚类方法以识别具有相似目标人群的试验并提高试验搜索效率是可行的。