Penev Lyubomir, Lyal Christopher Hc, Weitzman Anna, Morse David R, King David, Sautter Guido, Georgiev Teodor, Morris Robert A, Catapano Terry, Agosti Donat
Bulgarian Academy of Sciences & Pensoft Publishers, Sofia, Bulgaria.
Zookeys. 2011(150):89-116. doi: 10.3897/zookeys.150.2213. Epub 2011 Nov 28.
We review the three most widely used XML schemas used to mark-up taxonomic texts, TaxonX, TaxPub and taXMLit. These are described from the viewpoint of their development history, current status, implementation, and use cases. The concept of "taxon treatment" from the viewpoint of taxonomy mark-up into XML is discussed. TaxonX and taXMLit are primarily designed for legacy literature, the former being more lightweight and with a focus on recovery of taxon treatments, the latter providing a much more detailed set of tags to facilitate data extraction and analysis. TaxPub is an extension of the National Library of Medicine Document Type Definition (NLM DTD) for taxonomy focussed on layout and recovery and, as such, is best suited for mark-up of new publications and their archiving in PubMedCentral. All three schemas have their advantages and shortcomings and can be used for different purposes.
我们回顾了用于标记分类学文本的三种使用最广泛的XML模式,即TaxonX、TaxPub和taXMLit。从它们的发展历史、现状、实施情况和用例的角度对这些模式进行了描述。从分类学标记为XML的角度讨论了“分类单元处理”的概念。TaxonX和taXMLit主要是为旧文献设计的,前者更轻量级,专注于分类单元处理的恢复,后者提供了一套更详细的标签以促进数据提取和分析。TaxPub是美国国立医学图书馆文档类型定义(NLM DTD)针对分类学的扩展,侧重于布局和恢复,因此最适合新出版物的标记及其在PubMedCentral中的存档。这三种模式都有其优缺点,可用于不同目的。