Kauchak David, Leroy Gondy, Pei Menglu, Colina Sonia
Pomona College, Claremont, CA.
University of Arizona, Tucson, AZ.
AMIA Annu Symp Proc. 2020 Mar 4;2019:523-531. eCollection 2019.
Transition words add important information and are useful for increasing text comprehension for readers. Our goal is to automatically detect transition words in the medical domain. We introduce a new dataset for identifying transition words categorized into 16 different types with occurrences in adjacent sentence pairs in medical texts from English and Spanish Wikipedia (70K and 27K examples, respectively). We provide classification results using a feedforward neural network with word embedding features. Overall, we detect the need for a transition word with 78% accuracy in English and 84% in Spanish. For individual transition word categories, performance varies widely and is not related to either the number of training examples or the number of transition words in the category. The best accuracy in English was for Examplification words (82%) and in Spanish for Contrast words (96%).
过渡词增添了重要信息,有助于提高读者对文本的理解。我们的目标是自动检测医学领域中的过渡词。我们引入了一个新数据集,用于识别过渡词,这些过渡词被分为16种不同类型,出现在来自英文和西班牙文维基百科的医学文本中的相邻句子对中(分别有70000个和27000个示例)。我们使用具有词嵌入特征的前馈神经网络提供分类结果。总体而言,我们检测过渡词需求的准确率在英文中为78%,在西班牙文中为84%。对于各个过渡词类别,性能差异很大,并且与训练示例的数量或类别中的过渡词数量均无关。英文中准确率最高的是举例词(82%),西班牙文中是对比词(96%)。