Tian Shubo, Yin Pengfei, Zhang Hansi, Erdengasileng Arslan, Bian Jiang, He Zhe
Department of Statistics, Florida State University, Tallahassee, USA.
Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, USA.
Proceedings (IEEE Int Conf Bioinformatics Biomed). 2023 Dec;2023:4426-4430. doi: 10.1109/bibm58861.2023.10385876. Epub 2024 Jan 18.
To enable electronic screening of eligible patients for clinical trials, free-text clinical trial eligibility criteria should be translated to a computable format. Natural language processing (NLP) techniques have the potential to automate this process. In this study, we explored a supervised multi-input multi-output (MIMO) sequence labelling model to parse eligibility criteria into combinations of fact and condition tuples. Our experiments on a small manually annotated training dataset showed that that the performance of the MIMO framework with a BERT-based encoder using all the input sequences achieved an overall lenient-level AUROC of 0.61. Although the performance is suboptimal, representing eligibility criteria into logical and semantically clear tuples can potentially make subsequent translation of these tuples into database queries more reliable.
为了实现对符合条件的患者进行临床试验的电子筛选,应将自由文本形式的临床试验纳入标准转换为可计算的格式。自然语言处理(NLP)技术有潜力使这一过程自动化。在本研究中,我们探索了一种监督式多输入多输出(MIMO)序列标注模型,以将纳入标准解析为事实和条件元组的组合。我们在一个小型人工标注训练数据集上的实验表明,使用所有输入序列的基于BERT编码器的MIMO框架的性能实现了0.61的总体宽松水平的曲线下面积(AUROC)。尽管性能并不理想,但将纳入标准表示为逻辑和语义清晰的元组可能会使这些元组随后转换为数据库查询更加可靠。