Dept. of Computer Science, Jerusalem College of Technology-Lev Academic Center, Jerusalem, Israel.
Dept. of Computer Science, Bar-Ilan University, Ramat-Gan, Israel.
PLoS One. 2024 Feb 23;19(2):e0293196. doi: 10.1371/journal.pone.0293196. eCollection 2024.
In this research, we extract time-related expressions from a rabbinic text in a semi-automatic manner. These expressions usually appear next to rabbinic references (name / nickname / acronym / book-name). The first step toward our goal is to find all the expressions near references in the corpus. However, not all of the phrases around the references are time-related expressions. Therefore, these phrases are initially considered to be potential time-related expressions. To extract the time-related expressions, we formulate two new statistical functions, and we use screening and heuristic methods. We tested these statistical functions, grammatical screenings, and heuristic methods on a corpus containing responsa documents. In this corpus, many rabbinic citations are known and marked. The statistical functions and the screening methods filtered the potential time-related expressions and reduced 99.88% of the initial expressions (from 484,681 to 575).
在这项研究中,我们以半自动的方式从拉比文本中提取与时间相关的表达式。这些表达式通常出现在拉比参考文献(姓名/昵称/首字母缩略词/书名)旁边。我们目标的第一步是在语料库中找到所有参考文献附近的表达式。然而,并非所有参考文献周围的短语都是与时间相关的表达式。因此,这些短语最初被视为潜在的与时间相关的表达式。为了提取与时间相关的表达式,我们制定了两个新的统计函数,并使用筛选和启发式方法。我们在包含答复文件的语料库上测试了这些统计函数、语法筛选和启发式方法。在这个语料库中,许多拉比引文是已知的并标记了。统计函数和筛选方法过滤了潜在的与时间相关的表达式,并将初始表达式的 99.88%减少到 575 个(从 484681 减少到 575)。