基于标准对一个拥有十年历史的分子信息数字存储库数据集进行编目。

Standards-based curation of a decade-old digital repository dataset of molecular information.

作者信息

Harvey Matthew J, Mason Nicholas J, McLean Andrew, Murray-Rust Peter, Rzepa Henry S, Stewart James J P

机构信息

High Performance Computing Service, Imperial College London, London, SW7 2AZ UK.

Department of Chemistry, Imperial College London, South Kensington Campus, London, SW7 2AZ UK.

出版信息

J Cheminform. 2015 Aug 27;7:43. doi: 10.1186/s13321-015-0093-3. eCollection 2015.

DOI:10.1186/s13321-015-0093-3

PMID:26322133

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4550659/

Abstract

BACKGROUND

The desirable curation of 158,122 molecular geometries derived from the NCI set of reference molecules together with associated properties computed using the MOPAC semi-empirical quantum mechanical method and originally deposited in 2005 into the Cambridge DSpace repository as a data collection is reported.

RESULTS

The procedures involved in the curation included annotation of the original data using new MOPAC methods, updating the syntax of the CML documents used to express the data to ensure schema conformance and adding new metadata describing the entries together with a XML schema transformation to map the metadata schema to that used by the DataCite organisation. We have adopted a granularity model in which a DataCite persistent identifier (DOI) is created for each individual molecule to enable data discovery and data metrics at this level using DataCite tools.

CONCLUSIONS

We recommend that the future research data management (RDM) of the scientific and chemical data components associated with journal articles (the "supporting information") should be conducted in a manner that facilitates automatic periodic curation. Graphical abstractStandards and metadata-based curation of a decade-old digital repository dataset of molecular information.

摘要

背景

报告了对源自美国国立癌症研究所（NCI）参考分子集的158,122种分子几何结构进行的理想整理，以及使用MOPAC半经验量子力学方法计算的相关属性，这些数据最初于2005年作为一个数据集存入剑桥DSpace知识库。

结果

整理过程涉及的步骤包括使用新的MOPAC方法对原始数据进行注释，更新用于表达数据的CML文档的语法以确保符合模式，并添加描述条目的新元数据以及进行XML模式转换，以将元数据模式映射为DataCite组织使用的模式。我们采用了一种粒度模型，为每个单独的分子创建一个DataCite持久标识符（DOI），以便使用DataCite工具在此级别进行数据发现和数据计量。

结论

我们建议，与期刊文章相关的科学和化学数据组件（“支持信息”）的未来研究数据管理（RDM）应以促进自动定期整理的方式进行。图形摘要基于标准和元数据对一个有十年历史的分子信息数字存储库数据集进行整理。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/deaa/4551767/693f51a3885c/13321_2015_93_Figa_HTML.jpg

相似文献

Standards-based curation of a decade-old digital repository dataset of molecular information.基于标准对一个拥有十年历史的分子信息数字存储库数据集进行编目。

J Cheminform. 2015 Aug 27;7:43. doi: 10.1186/s13321-015-0093-3. eCollection 2015.

Standards-based metadata procedures for retrieving data for display or mining utilizing persistent (data-DOI) identifiers.基于标准的元数据程序，用于利用持久性（数据DOI）标识符检索数据以进行显示或挖掘。

J Cheminform. 2015 Aug 8;7:37. doi: 10.1186/s13321-015-0081-7. eCollection 2015.

A metadata-driven approach to data repository design.一种用于数据存储库设计的元数据驱动方法。

J Cheminform. 2017 Jan 24;9:4. doi: 10.1186/s13321-017-0190-6. eCollection 2017.

Understanding the value of curation: A survey of US data repository curation practices and perceptions.理解策展的价值：对美国数据存储库策展实践和观念的调查。

PLoS One. 2024 Jun 14;19(6):e0301171. doi: 10.1371/journal.pone.0301171. eCollection 2024.

A metadata schema for data objects in clinical research.临床研究中数据对象的元数据模式。

Trials. 2016 Nov 24;17(1):557. doi: 10.1186/s13063-016-1686-5.

Assessment of a demonstrator repository for individual clinical trial data built upon DSpace.基于 DSpace 构建的个体临床试验数据演示库评估。

F1000Res. 2020 Apr 29;9:311. doi: 10.12688/f1000research.23468.2. eCollection 2020.

An open-source framework for neuroscience metadata management applied to digital reconstructions of neuronal morphology.一种用于神经科学元数据管理的开源框架，应用于神经元形态的数字重建。

Brain Inform. 2020 Mar 26;7(1):2. doi: 10.1186/s40708-020-00103-3.

Development of an open metadata schema for prospective clinical research (openPCR) in China.中国前瞻性临床研究开放元数据模式（openPCR）的开发。

Methods Inf Med. 2014;53(1):39-46. doi: 10.3414/ME13-01-0008. Epub 2013 Dec 9.

QLMDR: a GraphQL query language for ISO 11179-based metadata repositories.QLMDR：基于 ISO 11179 的元数据存储库的 GraphQL 查询语言。

BMC Med Inform Decis Mak. 2019 Mar 18;19(1):45. doi: 10.1186/s12911-019-0794-z.

PaperBot: open-source web-based search and metadata organization of scientific literature.PaperBot：基于网络的开源科学文献搜索和元数据组织工具。

BMC Bioinformatics. 2019 Jan 24;20(1):50. doi: 10.1186/s12859-019-2613-z.

引用本文的文献

A metadata-driven approach to data repository design.一种用于数据存储库设计的元数据驱动方法。

J Cheminform. 2017 Jan 24;9:4. doi: 10.1186/s13321-017-0190-6. eCollection 2017.

ChemEngine: harvesting 3D chemical structures of supplementary data from PDF files.化学引擎：从PDF文件中提取补充数据的三维化学结构

J Cheminform. 2016 Dec 29;8:73. doi: 10.1186/s13321-016-0175-x. eCollection 2016.

J Cheminform. 2015 Aug 8;7:37. doi: 10.1186/s13321-015-0081-7. eCollection 2015.

本文引用的文献

Experiences with a researcher-centric ELN.以研究者为中心的电子实验室笔记本的经验。

Chem Sci. 2015 Mar 1;6(3):1614-1629. doi: 10.1039/c4sc02128b. Epub 2014 Oct 20.

J Cheminform. 2015 Aug 8;7:37. doi: 10.1186/s13321-015-0081-7. eCollection 2015.

Quantum chemistry structures and properties of 134 kilo molecules.134 千克分子的量子化学结构和性质。

Sci Data. 2014 Aug 5;1:140022. doi: 10.1038/sdata.2014.22. eCollection 2014.

Digital data repositories in chemistry and their integration with journals and electronic notebooks.化学领域的数字数据存储库及其与期刊和电子笔记本的集成。

J Chem Inf Model. 2014 Oct 27;54(10):2627-35. doi: 10.1021/ci500302p. Epub 2014 Sep 15.

Scientific and technical data sharing: a trading perspective.科技数据共享：交易视角

J Comput Aided Mol Des. 2014 Oct;28(10):989-96. doi: 10.1007/s10822-014-9785-4. Epub 2014 Aug 12.

InChI - the worldwide chemical structure identifier standard.InChI - 全球化学结构标识符标准。

J Cheminform. 2013 Jan 24;5(1):7. doi: 10.1186/1758-2946-5-7.

Chemical datuments as scientific enablers.化学文献作为科学推动器。

J Cheminform. 2013 Jan 23;5(1):6. doi: 10.1186/1758-2946-5-6.

Optimization of parameters for semiempirical methods VI: more modifications to the NDDO approximations and re-optimization of parameters.半经验方法参数优化 VI：对 NDDO 近似的更多修正和参数的重新优化。

J Mol Model. 2013 Jan;19(1):1-32. doi: 10.1007/s00894-012-1667-x. Epub 2012 Nov 28.

Avogadro: an advanced semantic chemical editor, visualization, and analysis platform.阿伏伽德罗：一个先进的语义化学编辑器、可视化和分析平台。

J Cheminform. 2012 Aug 13;4(1):17. doi: 10.1186/1758-2946-4-17.

Open Babel: An open chemical toolbox.Open Babel：一个开放的化学工具箱。

J Cheminform. 2011 Oct 7;3:33. doi: 10.1186/1758-2946-3-33.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于标准对一个拥有十年历史的分子信息数字存储库数据集进行编目。

Standards-based curation of a decade-old digital repository dataset of molecular information.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献