Hassanzadeh Hamed, Karimi Sarvnaz, Nguyen Anthony
CSIRO The Australian e-Health Research Centre, Brisbane, Australia.
CSIRO Data61, Sydney, Australia.
J Biomed Inform. 2020 May;105:103406. doi: 10.1016/j.jbi.2020.103406. Epub 2020 Mar 10.
Recruiting eligible patients for clinical trials is crucial for reliably answering specific questions about medical interventions and evaluation. However, clinical trial recruitment is a bottleneck in clinical research and drug development. Our goal is to provide an approach towards automating this manual and time-consuming patient recruitment task using natural language processing and machine learning techniques. Specifically, our approach extracts key information from series of narrative clinical documents in patient's records and collates helpful evidence to make decisions on eligibility of patients according to certain inclusion and exclusion criteria. Challenges in applying narrative clinical documents such as differences in reporting styles and sub-languages are addressed by enriching them with knowledge from domain ontologies in the form of semantic vector representations. We show that a machine learning model based on Multi-Layer Perceptron (MLP) is more effective for the task than five other neural networks and four conventional machine learning models. Our approach achieves overall micro-F1-Score of 84% for 13 different eligibility criteria. Our experiments also indicate that semantically enriched documents are more effective than using original documents for cohort selection. Our system provides an end-to-end machine learning-based solution that achieves comparable results with the state-of-the-art which relies on hand-crafted rules or data-centric engineered features.
为临床试验招募符合条件的患者对于可靠地回答有关医学干预和评估的特定问题至关重要。然而,临床试验招募是临床研究和药物开发中的一个瓶颈。我们的目标是提供一种方法,利用自然语言处理和机器学习技术,实现这一手动且耗时的患者招募任务的自动化。具体而言,我们的方法从患者记录中的一系列叙述性临床文档中提取关键信息,并整理有用的证据,以便根据某些纳入和排除标准对患者的资格做出决策。通过以语义向量表示的形式用领域本体中的知识丰富叙述性临床文档,解决了应用此类文档时存在的报告风格和子语言差异等挑战。我们表明,基于多层感知器(MLP)的机器学习模型在该任务上比其他五个神经网络和四个传统机器学习模型更有效。对于13种不同的资格标准,我们的方法实现了84%的总体微F1分数。我们的实验还表明,语义丰富的文档在队列选择方面比使用原始文档更有效。我们的系统提供了一个基于机器学习的端到端解决方案,其取得的结果与依赖手工规则或数据中心工程特征的最先进方法相当。