Suppr超能文献

通过考虑数学公式的文本上下文来改进其表示和转换。

Improving the representation and conversion of mathematical formulae by considering their textual context.

作者信息

Schubotz Moritz, Greiner-Petter André, Scharpf Philipp, Meuschke Norman, Cohl Howard S, Gipp Bela

机构信息

Information Science Group, University of Konstanz, Germany.

Applied and Computational Mathematics Division, NIST, U.S.A.

出版信息

TUGboat (Provid). 2018 May;39(3). doi: 10.1145/3197026.3197058.

Abstract

Mathematical formulae represent complex semantic information in a concise form. Especially in Science, Technology, Engineering, and Mathematics, mathematical formulae are crucial for communicating information, e.g., in scientific papers, and to perform computations using computer algebra systems. Enabling computers to access the information encoded in mathematical formulae requires machine-readable formats that can represent both the presentation and content, i.e., the semantics, of formulae. Exchanging such information between systems additionally requires conversion methods for mathematical representation formats. We analyze how the semantic enrichment of formulae improves the format conversion process and show that considering the textual context of formulae reduces the error rate of such conversions. Our main contributions are: (1) providing an openly available benchmark dataset for the mathematical format conversion task consisting of a newly created test collection, an extensive, manually curated gold standard and task-specific evaluation metrics; (2) performing a quantitative evaluation of state-of-the-art tools for mathematical format conversions; (3) presenting a new approach that considers the textual context of formulae to reduce the error rate for mathematical format conversions. Our benchmark dataset facilitates future research on mathematical format conversions as well as research on many problems in mathematical information retrieval. Because we annotated and linked all components of formulae, e.g., identifiers, operators and other entities, to Wikidata entries, the gold standard can, for instance, be used to train methods for formula concept discovery and recognition. Such methods can then be applied to improve mathematical information retrieval systems, e.g., for semantic formula search, recommendation of mathematical content, or detection of mathematical plagiarism.

摘要

数学公式以简洁的形式表示复杂的语义信息。特别是在科学、技术、工程和数学领域,数学公式对于信息交流至关重要,例如在科学论文中,并且对于使用计算机代数系统进行计算也很关键。要使计算机能够访问编码在数学公式中的信息,就需要机器可读格式,这种格式既要能表示公式的呈现形式,也要能表示其内容,即语义。在系统之间交换此类信息还需要数学表示格式的转换方法。我们分析了公式的语义丰富如何改进格式转换过程,并表明考虑公式的文本上下文可以降低此类转换的错误率。我们的主要贡献包括:(1)为数学格式转换任务提供一个公开可用的基准数据集,该数据集由新创建的测试集、广泛的、人工整理的黄金标准以及特定任务的评估指标组成;(2)对用于数学格式转换的现有工具进行定量评估;(3)提出一种新方法,该方法考虑公式的文本上下文以降低数学格式转换的错误率。我们的基准数据集有助于未来关于数学格式转换的研究以及数学信息检索中许多问题的研究。由于我们将公式的所有组件(例如标识符、运算符和其他实体)注释并链接到维基数据条目,因此黄金标准例如可用于训练公式概念发现和识别的方法。然后可以应用这些方法来改进数学信息检索系统,例如用于语义公式搜索、数学内容推荐或数学抄袭检测。

相似文献

2
Do the Math: Making Mathematics in Wikipedia Computable.算算看:让维基百科中的数学内容可计算
IEEE Trans Pattern Anal Mach Intell. 2023 Apr;45(4):4384-4395. doi: 10.1109/TPAMI.2022.3195261. Epub 2023 Mar 7.
5
linkedISA: semantic representation of ISA-Tab experimental metadata.linkedISA:ISA-Tab 实验元数据的语义表示。
BMC Bioinformatics. 2014;15 Suppl 14(Suppl 14):S4. doi: 10.1186/1471-2105-15-S14-S4. Epub 2014 Nov 27.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验