Farnsworth Steele, Gurdin Gabrielle, Vargas Jorge, Mulyar Andriy, Lewinski Nastassja, McInnes Bridget T
Virginia Commonwealth University, 401 S. Main St., Richmond, VA 23284, USA.
Virginia Commonwealth University, 401 S. Main St., Richmond, VA 23284, USA.
J Biomed Inform. 2022 Feb;126:103970. doi: 10.1016/j.jbi.2021.103970. Epub 2021 Dec 14.
Systematic reviews are labor-intensive processes to combine all knowledge about a given topic into a coherent summary. Despite the high labor investment, they are necessary to create an exhaustive overview of current evidence relevant to a research question. In this work, we evaluate three state-of-the-art supervised multi-label sequence classification systems to automatically identify 24 different experimental design factors for the categories of Animal, Dose, Exposure, and Endpoint from journal articles describing the experiments related to toxicity and health effects of environmental agents. We then present an in depth analysis of the results evaluating the lexical diversity of the design parameters with respect to model performance, evaluating the impact of tokenization and non-contiguous mentions, and finally evaluating the dependencies between entities within the category entities. We demonstrate that in general, algorithms that use embedded representations of the sequences out-perform statistical algorithms, but that even these algorithms struggle with lexically diverse entities.
系统评价是将关于给定主题的所有知识整合为连贯总结的劳动密集型过程。尽管投入了大量人力,但它们对于全面概述与研究问题相关的当前证据是必要的。在这项工作中,我们评估了三种最先进的监督多标签序列分类系统,以从描述环境因子毒性和健康影响相关实验的期刊文章中自动识别动物、剂量、暴露和终点类别中的24种不同实验设计因素。然后,我们对结果进行深入分析,评估设计参数在词汇多样性方面对模型性能的影响,评估词元化和非连续提及的影响,最后评估类别实体中各实体之间的依赖性。我们证明,一般来说,使用序列嵌入表示的算法优于统计算法,但即使是这些算法在处理词汇多样的实体时也存在困难。