Suppr超能文献

基于PubChem中不平衡高通量筛选数据的定量构效关系建模

QSAR modeling of imbalanced high-throughput screening data in PubChem.

作者信息

Zakharov Alexey V, Peach Megan L, Sitzmann Markus, Nicklaus Marc C

机构信息

CADD Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health , DHHS, NCI-Frederick, 376 Boyles St., Frederick, Maryland 21702, United States.

出版信息

J Chem Inf Model. 2014 Mar 24;54(3):705-12. doi: 10.1021/ci400737s. Epub 2014 Feb 28.

Abstract

Many of the structures in PubChem are annotated with activities determined in high-throughput screening (HTS) assays. Because of the nature of these assays, the activity data are typically strongly imbalanced, with a small number of active compounds contrasting with a very large number of inactive compounds. We have used several such imbalanced PubChem HTS assays to test and develop strategies to efficiently build robust QSAR models from imbalanced data sets. Different descriptor types [Quantitative Neighborhoods of Atoms (QNA) and "biological" descriptors] were used to generate a variety of QSAR models in the program GUSAR. The models obtained were compared using external test and validation sets. We also report on our efforts to incorporate the most predictive of our models in the publicly available NCI/CADD Group Web services ( http://cactus.nci.nih.gov/chemical/apps/cap).

摘要

许多PubChem中的结构都标注有通过高通量筛选(HTS)测定确定的活性。由于这些测定的性质,活性数据通常严重不平衡,少数活性化合物与大量非活性化合物形成对比。我们使用了几个这样不平衡的PubChem HTS测定来测试和开发从不平衡数据集中有效构建稳健QSAR模型的策略。在GUSAR程序中使用了不同的描述符类型[原子定量邻域(QNA)和“生物学”描述符]来生成各种QSAR模型。使用外部测试集和验证集对获得的模型进行比较。我们还报告了我们将最具预测性的模型纳入公开可用的NCI/CADD Group网络服务(http://cactus.nci.nih.gov/chemical/apps/cap)的努力。

相似文献

4
Automatically detecting workflows in PubChem.自动检测化学物质信息数据库中的工作流程。
J Biomol Screen. 2012 Sep;17(8):1071-9. doi: 10.1177/1087057112449054. Epub 2012 Jun 12.
6
QNA-based 'Star Track' QSAR approach.基于问答的“明星轨迹”定量构效关系方法。
SAR QSAR Environ Res. 2009 Oct;20(7-8):679-709. doi: 10.1080/10629360903438370.

引用本文的文献

本文引用的文献

1
QSAR Modelling of Rat Acute Toxicity on the Basis of PASS Prediction.基于 PASS 预测的大鼠急性毒性 QSAR 建模。
Mol Inform. 2011 Mar 14;30(2-3):241-50. doi: 10.1002/minf.201000151. Epub 2011 Mar 18.
2
Coping with unbalanced class data sets in oral absorption models.应对口服吸收模型中不平衡的数据集。
J Chem Inf Model. 2013 Feb 25;53(2):461-74. doi: 10.1021/ci300348u. Epub 2013 Jan 24.
5
6
Scientific workflow systems: Pipeline Pilot and KNIME.科学工作流系统:管道先导(Pipeline Pilot)和康奈姆(KNIME)。
J Comput Aided Mol Des. 2012 Jul;26(7):801-4. doi: 10.1007/s10822-012-9577-7. Epub 2012 May 27.
10
ChEMBL: a large-scale bioactivity database for drug discovery.ChEMBL:用于药物发现的大型生物活性数据库。
Nucleic Acids Res. 2012 Jan;40(Database issue):D1100-7. doi: 10.1093/nar/gkr777. Epub 2011 Sep 23.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验