Zhang Kevin, Demner-Fushman Dina
College of Medicine and Life Sciences, University of Toledo, Toledo, OH, USA.
Lister Hill National Center for Biomedical Communications, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
J Am Med Inform Assoc. 2017 Jul 1;24(4):781-787. doi: 10.1093/jamia/ocw176.
To develop automated classification methods for eligibility criteria in ClinicalTrials.gov to facilitate patient-trial matching for specific populations such as persons living with HIV or pregnant women.
We annotated 891 interventional cancer trials from ClinicalTrials.gov based on their eligibility for human immunodeficiency virus (HIV)-positive patients using their eligibility criteria. These annotations were used to develop classifiers based on regular expressions and machine learning (ML). After evaluating classification of cancer trials for eligibility of HIV-positive patients, we sought to evaluate the generalizability of our approach to more general diseases and conditions. We annotated the eligibility criteria for 1570 of the most recent interventional trials from ClinicalTrials.gov for HIV-positive and pregnancy eligibility, and the classifiers were retrained and reevaluated using these data.
On the cancer-HIV dataset, the baseline regex model, the bag-of-words ML classifier, and the ML classifier with named entity recognition (NER) achieved macro-averaged F2 scores of 0.77, 0.87, and 0.87, respectively; the addition of NER did not result in a significant performance improvement. On the general dataset, ML + NER achieved macro-averaged F2 scores of 0.91 and 0.85 for HIV and pregnancy, respectively.
The eligibility status of specific patient populations, such as persons living with HIV and pregnant women, for clinical trials is of interest to both patients and clinicians. We show that it is feasible to develop a high-performing, automated trial classification system for eligibility status that can be integrated into consumer-facing search engines as well as patient-trial matching systems.
开发针对ClinicalTrials.gov中资格标准的自动分类方法,以促进针对特定人群(如艾滋病毒感染者或孕妇)的患者与试验匹配。
我们根据ClinicalTrials.gov中891项介入性癌症试验的资格标准,对其纳入人类免疫缺陷病毒(HIV)阳性患者的资格进行了注释。这些注释用于开发基于正则表达式和机器学习(ML)的分类器。在评估癌症试验对HIV阳性患者资格的分类后,我们试图评估我们的方法对更常见疾病和病症的通用性。我们对ClinicalTrials.gov中1570项最新介入性试验的HIV阳性和妊娠资格的资格标准进行了注释,并使用这些数据对分类器进行了重新训练和重新评估。
在癌症-HIV数据集上,基线正则表达式模型、词袋ML分类器和带有命名实体识别(NER)的ML分类器的宏观平均F2分数分别为0.77、0.87和0.87;添加NER并未导致性能显著提高。在通用数据集上,ML+NER对HIV和妊娠的宏观平均F2分数分别为0.91和0.85。
特定患者群体(如艾滋病毒感染者和孕妇)参与临床试验的资格状况,对患者和临床医生都很重要。我们表明,开发一个高性能的、自动的试验资格状态分类系统是可行的,该系统可以集成到面向消费者的搜索引擎以及患者与试验匹配系统中。