基于文本挖掘的 PubChem 生物测定邻域分析。

The Text-mining based PubChem Bioassay neighboring analysis.

机构信息

National Center for Biotechnology Information, US National Library of Medicine, 8600 Rockville Pike, Bethesda, MD 20894, USA.

出版信息

BMC Bioinformatics. 2010 Nov 8;11:549. doi: 10.1186/1471-2105-11-549.

DOI:10.1186/1471-2105-11-549

PMID:21059237

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3098095/

Abstract

BACKGROUND

In recent years, the number of High Throughput Screening (HTS) assays deposited in PubChem has grown quickly. As a result, the volume of both the structured information (i.e. molecular structure, bioactivities) and the unstructured information (such as descriptions of bioassay experiments), has been increasing exponentially. As a result, it has become even more demanding and challenging to efficiently assemble the bioactivity data by mining the huge amount of information to identify and interpret the relationships among the diversified bioassay experiments. In this work, we propose a text-mining based approach for bioassay neighboring analysis from the unstructured text descriptions contained in the PubChem BioAssay database.

RESULTS

The neighboring analysis is achieved by evaluating the cosine scores of each bioassay pair and fraction of overlaps among the human-curated neighbors. Our results from the cosine score distribution analysis and assay neighbor clustering analysis on all PubChem bioassays suggest that strong correlations among the bioassays can be identified from their conceptual relevance. A comparison with other existing assay neighboring methods suggests that the text-mining based bioassay neighboring approach provides meaningful linkages among the PubChem bioassays, and complements the existing methods by identifying additional relationships among the bioassay entries.

CONCLUSIONS

The text-mining based bioassay neighboring analysis is efficient for correlating bioassays and studying different aspects of a biological process, which are otherwise difficult to achieve by existing neighboring procedures due to the lack of specific annotations and structured information. It is suggested that the text-mining based bioassay neighboring analysis can be used as a standalone or as a complementary tool for the PubChem bioassay neighboring process to enable efficient integration of assay results and generate hypotheses for the discovery of bioactivities of the tested reagents.

摘要

背景

近年来，PubChem 中储存的高通量筛选 (HTS) 测定数量迅速增加。结果，无论是结构化信息（即分子结构、生物活性）还是非结构化信息（例如生物测定实验的描述）的数量都呈指数级增长。因此，通过挖掘大量信息来有效地组合生物活性数据，以识别和解释多样化的生物测定实验之间的关系，变得更加苛刻和具有挑战性。在这项工作中，我们提出了一种基于文本挖掘的方法，用于从 PubChem BioAssay 数据库中包含的非结构化文本描述中进行生物测定邻域分析。

结果

通过评估每个生物测定对的余弦得分以及人工策邻居之间的重叠部分分数，实现了邻域分析。我们对所有 PubChem 生物测定的余弦得分分布分析和测定邻居聚类分析的结果表明，可以从概念相关性识别出生物测定之间的强相关性。与其他现有测定邻域方法的比较表明，基于文本挖掘的生物测定邻域分析为 PubChem 生物测定之间提供了有意义的联系，并通过识别生物测定条目中的其他关系来补充现有方法。

结论

基于文本挖掘的生物测定邻域分析可有效地关联生物测定并研究生物过程的不同方面，由于缺乏特定的注释和结构化信息，现有邻域程序很难实现这一点。建议基于文本挖掘的生物测定邻域分析可作为 PubChem 生物测定邻域处理的独立或补充工具，以实现测定结果的有效整合，并为测试试剂的生物活性发现生成假设。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0829/3098095/1283bb057ed7/1471-2105-11-549-1.jpg

相似文献

The Text-mining based PubChem Bioassay neighboring analysis.基于文本挖掘的 PubChem 生物测定邻域分析。

BMC Bioinformatics. 2010 Nov 8;11:549. doi: 10.1186/1471-2105-11-549.

Classification and analysis of a large collection of in vivo bioassay descriptions.大量体内生物测定描述的分类与分析

PLoS Comput Biol. 2017 Jul 5;13(7):e1005641. doi: 10.1371/journal.pcbi.1005641. eCollection 2017 Jul.

PubChem's BioAssay Database.PubChem 的生物测定数据库。

Nucleic Acids Res. 2012 Jan;40(Database issue):D400-12. doi: 10.1093/nar/gkr1132. Epub 2011 Dec 2.

Using the BioAssay Ontology for analyzing high-throughput screening data.使用生物测定本体论分析高通量筛选数据。

J Biomol Screen. 2015 Mar;20(3):402-15. doi: 10.1177/1087057114563493. Epub 2014 Dec 15.

Generating the Blood Exposome Database Using a Comprehensive Text Mining and Database Fusion Approach.运用全面的文本挖掘和数据库融合方法生成血液外显子组数据库。

Environ Health Perspect. 2019 Sep;127(9):97008. doi: 10.1289/EHP4713. Epub 2019 Sep 26.

An overview of the PubChem BioAssay resource.PubChem 生物测定资源概述。

Nucleic Acids Res. 2010 Jan;38(Database issue):D255-66. doi: 10.1093/nar/gkp965. Epub 2009 Nov 19.

PubChem BioAssay: A Decade's Development toward Open High-Throughput Screening Data Sharing.PubChem 生物测定：迈向开放高通量筛选数据共享的十年发展。

SLAS Discov. 2017 Jul;22(6):655-666. doi: 10.1177/2472555216685069. Epub 2017 Jan 13.

Developing and validating predictive decision tree models from mining chemical structural fingerprints and high-throughput screening data in PubChem.利用PubChem中的化学结构指纹和高通量筛选数据开发并验证预测性决策树模型。

BMC Bioinformatics. 2008 Sep 25;9:401. doi: 10.1186/1471-2105-9-401.

PubChem BioAssay: 2014 update.PubChem 生物测定：2014 年更新。

Nucleic Acids Res. 2014 Jan;42(Database issue):D1075-82. doi: 10.1093/nar/gkt978. Epub 2013 Nov 5.

Big data in chemical toxicity research: the use of high-throughput screening assays to identify potential toxicants.化学毒性研究中的大数据：利用高通量筛选分析来识别潜在毒物。

Chem Res Toxicol. 2014 Oct 20;27(10):1643-51. doi: 10.1021/tx500145h. Epub 2014 Sep 16.

引用本文的文献

Cheminformatics and artificial intelligence for accelerating agrochemical discovery.用于加速农用化学品发现的化学信息学与人工智能

Front Chem. 2023 Nov 29;11:1292027. doi: 10.3389/fchem.2023.1292027. eCollection 2023.

ACTG1 and TLR3 are biomarkers for alcohol-associated hepatocellular carcinoma.ACTG1和TLR3是酒精相关性肝细胞癌的生物标志物。

Oncol Lett. 2019 Feb;17(2):1714-1722. doi: 10.3892/ol.2018.9757. Epub 2018 Nov 26.

Constructing Genetic Networks using Biomedical Literature and Rare Event Classification.利用生物医学文献和罕见事件分类构建遗传网络。

Sci Rep. 2017 Nov 17;7(1):15784. doi: 10.1038/s41598-017-16081-2.

Differential protein-coding gene and long noncoding RNA expression in smoking-related lung squamous cell carcinoma.吸烟相关肺鳞癌中差异表达的蛋白编码基因和长非编码 RNA。

Thorac Cancer. 2017 Nov;8(6):672-681. doi: 10.1111/1759-7714.12510. Epub 2017 Sep 26.

Sequence based prediction of DNA-binding proteins based on hybrid feature selection using random forest and Gaussian naïve Bayes.基于随机森林和高斯朴素贝叶斯混合特征选择的DNA结合蛋白序列预测

PLoS One. 2014 Jan 24;9(1):e86703. doi: 10.1371/journal.pone.0086703. eCollection 2014.

An efficient algorithm coupled with synthetic minority over-sampling technique to classify imbalanced PubChem BioAssay data.一种有效的算法与合成少数过采样技术相结合，用于对不平衡的 PubChem BioAssay 数据进行分类。

Anal Chim Acta. 2014 Jan 2;806:117-27. doi: 10.1016/j.aca.2013.10.050. Epub 2013 Nov 6.

本文引用的文献

An overview of the PubChem BioAssay resource.PubChem 生物测定资源概述。

Nucleic Acids Res. 2010 Jan;38(Database issue):D255-66. doi: 10.1093/nar/gkp965. Epub 2009 Nov 19.

The NCBI BioSystems database.NCBI 生物系统数据库。

Nucleic Acids Res. 2010 Jan;38(Database issue):D492-6. doi: 10.1093/nar/gkp858. Epub 2009 Oct 23.

PubChem: a public information system for analyzing bioactivities of small molecules.PubChem：一个用于分析小分子生物活性的公共信息系统。

Nucleic Acids Res. 2009 Jul;37(Web Server issue):W623-33. doi: 10.1093/nar/gkp456. Epub 2009 Jun 4.

EpiLoc: a (working) text-based system for predicting protein subcellular location.EpiLoc：一个用于预测蛋白质亚细胞定位的（实用的）基于文本的系统。

Pac Symp Biocomput. 2008:604-15.

Literature-based concept profiles for gene annotation: the issue of weighting.基于文献的基因注释概念概况：加权问题。

Int J Med Inform. 2008 May;77(5):354-62. doi: 10.1016/j.ijmedinf.2007.07.004. Epub 2007 Sep 10.

Data mining and predictive modeling of biomolecular network from biomedical literature databases.从生物医学文献数据库对生物分子网络进行数据挖掘和预测建模。

IEEE/ACM Trans Comput Biol Bioinform. 2007 Apr-Jun;4(2):251-63. doi: 10.1109/TCBB.2007.070211.

SherLoc: high-accuracy prediction of protein subcellular localization by integrating text and protein sequence data.SherLoc：通过整合文本和蛋白质序列数据对蛋白质亚细胞定位进行高精度预测。

Bioinformatics. 2007 Jun 1;23(11):1410-7. doi: 10.1093/bioinformatics/btm115. Epub 2007 Mar 28.

Text mining of full-text journal articles combined with gene expression analysis reveals a relationship between sphingosine-1-phosphate and invasiveness of a glioblastoma cell line.结合基因表达分析的全文期刊文章文本挖掘揭示了1-磷酸鞘氨醇与胶质母细胞瘤细胞系侵袭性之间的关系。

BMC Bioinformatics. 2006 Aug 10;7:373. doi: 10.1186/1471-2105-7-373.

Quantitative assessment of dictionary-based protein named entity tagging.基于词典的蛋白质命名实体标注的定量评估

J Am Med Inform Assoc. 2006 Sep-Oct;13(5):497-507. doi: 10.1197/jamia.M2085. Epub 2006 Jun 23.

Status of text-mining techniques applied to biomedical text.应用于生物医学文本的文本挖掘技术现状。

Drug Discov Today. 2006 Apr;11(7-8):315-25. doi: 10.1016/j.drudis.2006.02.011.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于文本挖掘的 PubChem 生物测定邻域分析。

The Text-mining based PubChem Bioassay neighboring analysis.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献