Janssen Research & Development, LLC, 1400 McKean Rd, Spring House, PA, 19477, USA.
Ther Innov Regul Sci. 2021 Mar;55(2):447-453. doi: 10.1007/s43441-020-00236-x. Epub 2020 Oct 30.
The ability to detect patterns and trends across protocol deviations (PDs) is key to ensure high data quality and sufficient oversight of patient safety. In clinical trial operations, some business processes and work instructions limit efficient protocol deviation trending because a majority of protocol deviations are left unclassified. When this occurs, it restricts clinical teams from determining systemic issues or signals in the data. The unstructured text in protocol deviation descriptions is an important component of trial operation knowledge. Natural language processing (NLP) can make protocol deviation descriptions more accessible and can support information extraction and trending analysis. This paper reviews how the natural language processing techniques of Term-Frequency Inverse-Document-Frequency (TF-IDF) combined with the supervised machine learning model of Support Vector Machines (SVM) and word embedding approaches such as word2vec can be used to categorize/label protocol deviations across multiple therapeutic areas. NLP is a key tool that will lead to more data driven decisions in clinical trial operations.
发现方案偏离(Protocol deviations,PDs)中的模式和趋势的能力对于确保高质量的数据和充分监督患者安全至关重要。在临床试验运营中,一些业务流程和工作说明会限制方案偏离趋势的效率,因为大多数方案偏离都未分类。发生这种情况时,会限制临床团队确定数据中的系统性问题或信号。方案偏离描述中的非结构化文本是试验运营知识的重要组成部分。自然语言处理(Natural language processing,NLP)可以使方案偏离描述更易于访问,并支持信息提取和趋势分析。本文回顾了如何结合支持向量机(Support Vector Machines,SVM)的监督机器学习模型和词嵌入方法(如 word2vec)的术语频率逆文档频率(Term-Frequency Inverse-Document-Frequency,TF-IDF)自然语言处理技术,用于对多个治疗领域的方案偏离进行分类/标记。NLP 是一种关键工具,将导致临床试验运营中更多的数据驱动决策。