大量体内生物测定描述的分类与分析

Classification and analysis of a large collection of in vivo bioassay descriptions.

作者信息

Zwierzyna Magdalena, Overington John P

机构信息

BenevolentAI, London, United Kingdom.

Institute of Cardiovascular Science, University College London, London, United Kingdom.

出版信息

PLoS Comput Biol. 2017 Jul 5;13(7):e1005641. doi: 10.1371/journal.pcbi.1005641. eCollection 2017 Jul.

DOI:10.1371/journal.pcbi.1005641

PMID:28678787

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5517062/

Abstract

Testing potential drug treatments in animal disease models is a decisive step of all preclinical drug discovery programs. Yet, despite the importance of such experiments for translational medicine, there have been relatively few efforts to comprehensively and consistently analyze the data produced by in vivo bioassays. This is partly due to their complexity and lack of accepted reporting standards-publicly available animal screening data are only accessible in unstructured free-text format, which hinders computational analysis. In this study, we use text mining to extract information from the descriptions of over 100,000 drug screening-related assays in rats and mice. We retrieve our dataset from ChEMBL-an open-source literature-based database focused on preclinical drug discovery. We show that in vivo assay descriptions can be effectively mined for relevant information, including experimental factors that might influence the outcome and reproducibility of animal research: genetic strains, experimental treatments, and phenotypic readouts used in the experiments. We further systematize extracted information using unsupervised language model (Word2Vec), which learns semantic similarities between terms and phrases, allowing identification of related animal models and classification of entire assay descriptions. In addition, we show that random forest models trained on features generated by Word2Vec can predict the class of drugs tested in different in vivo assays with high accuracy. Finally, we combine information mined from text with curated annotations stored in ChEMBL to investigate the patterns of usage of different animal models across a range of experiments, drug classes, and disease areas.

摘要

在动物疾病模型中测试潜在的药物治疗方法是所有临床前药物发现计划的决定性步骤。然而，尽管此类实验对转化医学很重要，但相对较少有人致力于全面且一致地分析体内生物测定产生的数据。部分原因在于其复杂性以及缺乏公认的报告标准——公开可用的动物筛选数据只能以非结构化的自由文本格式获取，这阻碍了计算分析。在本研究中，我们使用文本挖掘从超过10万种大鼠和小鼠药物筛选相关测定的描述中提取信息。我们从ChEMBL（一个专注于临床前药物发现的基于文献的开源数据库）中检索我们的数据集。我们表明，可以有效地从体内测定描述中挖掘相关信息，包括可能影响动物研究结果和可重复性的实验因素：实验中使用的遗传品系、实验处理和表型读数。我们进一步使用无监督语言模型（Word2Vec）对提取的信息进行系统化，该模型学习术语和短语之间的语义相似性，从而能够识别相关的动物模型并对整个测定描述进行分类。此外，我们表明，基于Word2Vec生成的特征训练的随机森林模型可以高精度地预测在不同体内测定中测试的药物类别。最后，我们将从文本中挖掘的信息与ChEMBL中存储的精心策划的注释相结合，以研究不同动物模型在一系列实验、药物类别和疾病领域中的使用模式。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3cb2/5517062/06c386692484/pcbi.1005641.g001.jpg

相似文献

Classification and analysis of a large collection of in vivo bioassay descriptions.大量体内生物测定描述的分类与分析

PLoS Comput Biol. 2017 Jul 5;13(7):e1005641. doi: 10.1371/journal.pcbi.1005641. eCollection 2017 Jul.

The Text-mining based PubChem Bioassay neighboring analysis.基于文本挖掘的 PubChem 生物测定邻域分析。

BMC Bioinformatics. 2010 Nov 8;11:549. doi: 10.1186/1471-2105-11-549.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Formalization, annotation and analysis of diverse drug and probe screening assay datasets using the BioAssay Ontology (BAO).使用生物测定学本体（BAO）对各种药物和探针筛选分析实验数据集进行形式化、注释和分析。

PLoS One. 2012;7(11):e49198. doi: 10.1371/journal.pone.0049198. Epub 2012 Nov 14.

Text mining-based word representations for biomedical data analysis and protein-protein interaction networks in machine learning tasks.基于文本挖掘的词表示在生物医学数据分析和机器学习任务中的蛋白质-蛋白质相互作用网络。

PLoS One. 2021 Oct 15;16(10):e0258623. doi: 10.1371/journal.pone.0258623. eCollection 2021.

Using the BioAssay Ontology for analyzing high-throughput screening data.使用生物测定本体论分析高通量筛选数据。

J Biomol Screen. 2015 Mar;20(3):402-15. doi: 10.1177/1087057114563493. Epub 2014 Dec 15.

Multioutput Perturbation-Theory Machine Learning (PTML) Model of ChEMBL Data for Antiretroviral Compounds.多输出扰断理论机器学习（PTML）模型的 CHEMBL 数据抗逆转录病毒化合物。

Mol Pharm. 2019 Oct 7;16(10):4200-4212. doi: 10.1021/acs.molpharmaceut.9b00538. Epub 2019 Aug 30.

Mining the ChEMBL database: an efficient chemoinformatics workflow for assembling an ion channel-focused screening library.挖掘 ChEMBL 数据库：组装离子通道为重点的筛选库的高效化学生信工作流程。

J Chem Inf Model. 2011 Oct 24;51(10):2449-54. doi: 10.1021/ci200260t. Epub 2011 Oct 6.

Information extraction from multi-institutional radiology reports.从多机构放射学报告中提取信息。

Artif Intell Med. 2016 Jan;66:29-39. doi: 10.1016/j.artmed.2015.09.007. Epub 2015 Oct 3.

Portable automatic text classification for adverse drug reaction detection via multi-corpus training.通过多语料库训练实现用于药物不良反应检测的便携式自动文本分类

J Biomed Inform. 2015 Feb;53:196-207. doi: 10.1016/j.jbi.2014.11.002. Epub 2014 Nov 8.

引用本文的文献

U-Net based vessel segmentation for murine brains with small micro-magnetic resonance imaging reference datasets.基于 U-Net 的小型微磁共振成像参考数据集的鼠脑血管分割。

PLoS One. 2023 Oct 12;18(10):e0291946. doi: 10.1371/journal.pone.0291946. eCollection 2023.

The Hitchhiker's Guide to Human Therapeutic Nanoparticle Development.《人类治疗性纳米颗粒开发指南》

Pharmaceutics. 2022 Jan 21;14(2):247. doi: 10.3390/pharmaceutics14020247.

An Overview of Antimicrobial, Toxicity, and Biosafety Assessment by Models.模型对抗菌、毒性和生物安全性评估的概述

Front Microbiol. 2021 Apr 15;12:630695. doi: 10.3389/fmicb.2021.630695. eCollection 2021.

Extending the small-molecule similarity principle to all levels of biology with the Chemical Checker.用化学检验器将小分子相似性原理扩展到生物学的各个层次。

Nat Biotechnol. 2020 Sep;38(9):1087-1096. doi: 10.1038/s41587-020-0502-7. Epub 2020 May 18.

Menagerie: A text-mining tool to support animal-human translation in neurodegeneration research.动物园：一种文本挖掘工具，用于支持神经退行性疾病研究中的动物-人类翻译。

PLoS One. 2019 Dec 17;14(12):e0226176. doi: 10.1371/journal.pone.0226176. eCollection 2019.

Identifying antimicrobial peptides using word embedding with deep recurrent neural networks.使用深度递归神经网络的词嵌入来识别抗菌肽。

Bioinformatics. 2019 Jun 1;35(12):2009-2016. doi: 10.1093/bioinformatics/bty937.

A large-scale dataset of in vivo pharmacology assay results.体内药理学检测结果的大型数据集。

Sci Data. 2018 Oct 23;5:180230. doi: 10.1038/sdata.2018.230.

本文引用的文献

SWIFT-Review: a text-mining workbench for systematic review.SWIFT-Review：一个用于系统评价的文本挖掘工作台。

Syst Rev. 2016 May 23;5:87. doi: 10.1186/s13643-016-0263-z.

Bias in the reporting of sex and age in biomedical research on mouse models.小鼠模型生物医学研究中性别和年龄报告的偏差。

Elife. 2016 Mar 3;5:e13615. doi: 10.7554/eLife.13615.

Detection and categorization of bacteria habitats using shallow linguistic analysis.利用浅层语言分析检测和分类细菌栖息地

BMC Bioinformatics. 2015;16 Suppl 10(Suppl 10):S5. doi: 10.1186/1471-2105-16-S10-S5. Epub 2015 Jul 13.

Evidence should trump intuition by preferring inbred strains to outbred stocks in preclinical research.在临床前研究中，通过优先选择近交系而非远交系动物，证据应胜过直觉。

ILAR J. 2014;55(3):399-404. doi: 10.1093/ilar/ilu036.

The Rat Genome Database 2015: genomic, phenotypic and environmental variations and disease.《大鼠基因组数据库2015：基因组、表型和环境变异与疾病》

Nucleic Acids Res. 2015 Jan;43(Database issue):D743-50. doi: 10.1093/nar/gku1026. Epub 2014 Oct 29.

The number of scholarly documents on the public web.公共网络上的学术文献数量。

PLoS One. 2014 May 9;9(5):e93949. doi: 10.1371/journal.pone.0093949. eCollection 2014.

Rat Strain Ontology: structured controlled vocabulary designed to facilitate access to strain data at RGD.大鼠品系本体论：旨在促进在大鼠基因组数据库（RGD）中获取品系数据而设计的结构化控制词汇表。

J Biomed Semantics. 2013 Nov 22;4(1):36. doi: 10.1186/2041-1480-4-36.

The ChEMBL bioactivity database: an update.《ChEMBL 生物活性数据库更新》

Nucleic Acids Res. 2014 Jan;42(Database issue):D1083-90. doi: 10.1093/nar/gkt1031. Epub 2013 Nov 7.

The mouse pathology ontology, MPATH; structure and applications.小鼠病理学本体论，MPATH；结构与应用。

J Biomed Semantics. 2013 Sep 13;4(1):18. doi: 10.1186/2041-1480-4-18.

The Vertebrate Trait Ontology: a controlled vocabulary for the annotation of trait data across species.脊椎动物性状本体论：用于跨物种性状数据注释的受控词汇表。

J Biomed Semantics. 2013 Aug 9;4(1):13. doi: 10.1186/2041-1480-4-13.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

大量体内生物测定描述的分类与分析

Classification and analysis of a large collection of in vivo bioassay descriptions.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献