Suppr超能文献

迈向全面的药物本体:从多样信息源中提取药物-适应症关系。

Toward a comprehensive drug ontology: extraction of drug-indication relations from diverse information sources.

作者信息

Sharp Mark E

机构信息

Scientific Information Management, Merck Research Laboratories, 770 Sumneytown Pike, West Point, Philadelphia, PA, 19486, USA.

出版信息

J Biomed Semantics. 2017 Jan 10;8(1):2. doi: 10.1186/s13326-016-0110-0.

Abstract

BACKGROUND

Drug ontologies could help pharmaceutical researchers overcome information overload and speed the pace of drug discovery, thus benefiting the industry and patients alike. Drug-disease relations, specifically drug-indication relations, are a prime candidate for representation in ontologies. There is a wealth of available drug-indication information, but structuring and integrating it is challenging.

RESULTS

We created a drug-indication database (DID) of data from 12 openly available, commercially available, and proprietary information sources, integrated by terminological normalization to UMLS and other authorities. Across sources, there are 29,964 unique raw drug/chemical names, 10,938 unique raw indication "target" terms, and 192,008 unique raw drug-indication pairs. Drug/chemical name normalization to CAS numbers or UMLS concepts reduced the unique name count to 91 or 85% of the raw count, respectively, 84% if combined. Indication "target" normalization to UMLS "phenotypic-type" concepts reduced the unique term count to 57% of the raw count. The 12 sources of raw data varied widely in coverage (numbers of unique drug/chemical and indication concepts and relations) generally consistent with the idiosyncrasies of each source, but had strikingly little overlap, suggesting that we successfully achieved source/raw data diversity.

CONCLUSIONS

The DID is a database of structured drug-indication relations intended to facilitate building practical, comprehensive, integrated drug ontologies. The DID itself is not an ontology, but could be converted to one more easily than the contributing raw data. Our methodology could be adapted to the creation of other structured drug-disease databases such as for contraindications, precautions, warnings, and side effects.

摘要

背景

药物本体能够帮助药物研究人员克服信息过载问题,加快药物研发进程,从而使制药行业和患者均受益。药物与疾病的关系,特别是药物与适应症的关系,是本体表示的主要候选对象。虽然有大量可用的药物适应症信息,但对其进行结构化和整合具有挑战性。

结果

我们创建了一个药物适应症数据库(DID),该数据库的数据来自12个公开可用、商业可用和专有信息源,并通过术语标准化整合到UMLS和其他权威机构。在所有信息源中,有29,964个独特的原始药物/化学名称、10,938个独特的原始适应症“目标”术语以及192,008个独特的原始药物 - 适应症对。将药物/化学名称标准化为CAS编号或UMLS概念后,独特名称数量分别减少至原始数量的91%或85%,若两者结合则为84%。将适应症“目标”标准化为UMLS“表型类型”概念后,独特术语数量减少至原始数量的57%。12个原始数据源在覆盖范围(独特药物/化学和适应症概念及关系的数量)上差异很大,这通常与每个数据源的特性一致,但重叠极少,这表明我们成功实现了数据源/原始数据的多样性。

结论

DID是一个结构化药物 - 适应症关系数据库,旨在促进构建实用、全面、集成的药物本体。DID本身不是本体,但与贡献的原始数据相比,它可以更轻松地转换为本体。我们的方法可适用于创建其他结构化药物 - 疾病数据库,如关于禁忌症、注意事项、警告和副作用的数据库。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验