Kim Sunghwan, Thiessen Paul A, Cheng Tiejun, Yu Bo, Shoemaker Benjamin A, Wang Jiyao, Bolton Evan E, Wang Yanli, Bryant Stephen H
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, 8600 Rockville Pike, Bethesda, MD 20894 USA.
J Cheminform. 2016 Jun 10;8:32. doi: 10.1186/s13321-016-0142-6. eCollection 2016.
PubChem is an open archive consisting of a set of three primary public databases (BioAssay, Compound, and Substance). It contains information on a broad range of chemical entities, including small molecules, lipids, carbohydrates, and (chemically modified) amino acid and nucleic acid sequences (including siRNA and miRNA). Currently (as of Nov. 2015), PubChem contains more than 150 million depositor-provided chemical substance descriptions, 60 million unique chemical structures, and 225 million biological activity test results provided from over 1 million biological assay records.
Many PubChem records (substances, compounds, and assays) include depositor-provided cross-references to scientific articles in PubMed. Some PubChem contributors provide bioactivity data extracted from scientific articles. Literature-derived bioactivity data complement high-throughput screening (HTS) data from the concluded NIH Molecular Libraries Program and other HTS projects. Some journals provide PubChem with information on chemicals that appear in their newly published articles, enabling concurrent publication of scientific articles in journals and associated data in public databases. In addition, PubChem links records to PubMed articles indexed with the Medical Subject Heading (MeSH) controlled vocabulary thesaurus.
Literature information, both provided by depositors and derived from MeSH annotations, can be accessed using PubChem's web interfaces, enabling users to explore information available in literature related to PubChem records beyond typical web search results.
Graphical abstractLiterature information for PubChem records is derived from various sources.
PubChem是一个开放存档库,由三个主要公共数据库(生物测定、化合物和物质)组成。它包含广泛化学实体的信息,包括小分子、脂质、碳水化合物以及(化学修饰的)氨基酸和核酸序列(包括siRNA和miRNA)。目前(截至2015年11月),PubChem包含超过1.5亿条由 depositor提供的化学物质描述、6000万个独特化学结构以及来自超过100万条生物测定记录的2.25亿条生物活性测试结果。
许多PubChem记录(物质、化合物和测定)包括 depositor提供的与PubMed中科学文章的交叉引用。一些PubChem贡献者提供从科学文章中提取的生物活性数据。源自文献的生物活性数据补充了来自已完成的美国国立卫生研究院分子文库计划和其他高通量筛选(HTS)项目的高通量筛选数据。一些期刊向PubChem提供其新发表文章中出现的化学物质信息,从而实现科学文章在期刊上的同时发表以及相关数据在公共数据库中的发布。此外,PubChem将记录链接到使用医学主题词(MeSH)控制词汇表索引的PubMed文章。
可以使用PubChem的网络界面访问由 depositor提供的以及源自MeSH注释的文献信息,这使得用户能够探索与PubChem记录相关的文献中可用的信息,而不仅仅是典型的网络搜索结果。
图形摘要PubChem记录的文献信息源自各种来源。