Oxford e-Research Centre, Engineering Science, University of Oxford, Oxford, UK.
Northrop Grumman Information Systems Health IT, Rockville, MD, USA.
J Am Med Inform Assoc. 2018 Jan 1;25(1):13-16. doi: 10.1093/jamia/ocx119.
The DAta Tag Suite (DATS) is a model supporting dataset description, indexing, and discovery. It is available as an annotated serialization with schema.org, a vocabulary used by major search engines, thus making the datasets discoverable on the web. DATS underlies DataMed, the National Institutes of Health Big Data to Knowledge Data Discovery Index prototype, which aims to provide a "PubMed for datasets." The experience gained while indexing a heterogeneous range of >60 repositories in DataMed helped in evaluating DATS's entities, attributes, and scope. In this work, 3 additional exemplary and diverse data sources were mapped to DATS by their representatives or experts, offering a deep scan of DATS fitness against a new set of existing data. The procedure, including feedback from users and implementers, resulted in DATS implementation guidelines and best practices, and identification of a path for evolving and optimizing the model. Finally, the work exposed additional needs when defining datasets for indexing, especially in the context of clinical and observational information.
数据标签套件(DATS)是一种支持数据集描述、索引和发现的模型。它以标注的序列化形式与 schema.org 一起提供,schema.org 是主要搜索引擎使用的词汇表,从而使数据集可以在网络上被发现。DATS 是 NIH 大数据转化知识数据发现索引原型 DataMed 的基础,旨在为“数据集的 PubMed”提供支持。在对 DataMed 中异构的超过 60 个存储库进行索引的过程中获得的经验,有助于评估 DATS 的实体、属性和范围。在这项工作中,代表或专家将另外 3 个具有代表性的不同数据源映射到 DATS 上,从而对 DATS 对现有数据集的适用性进行了深入扫描。这一过程包括用户和实施者的反馈,最终制定了 DATS 实施指南和最佳实践,并确定了模型演进和优化的途径。最后,这项工作在为索引定义数据集时暴露了额外的需求,特别是在临床和观察信息的背景下。