• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于上下文的分子对接数据预处理。

Context-based preprocessing of molecular docking data.

出版信息

BMC Genomics. 2013;14 Suppl 6(Suppl 6):S6. doi: 10.1186/1471-2164-14-S6-S6. Epub 2013 Oct 25.

DOI:10.1186/1471-2164-14-S6-S6
PMID:24564276
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3909228/
Abstract

BACKGROUND

Data preprocessing is a major step in data mining. In data preprocessing, several known techniques can be applied, or new ones developed, to improve data quality such that the mining results become more accurate and intelligible. Bioinformatics is one area with a high demand for generation of comprehensive models from large datasets. In this article, we propose a context-based data preprocessing approach to mine data from molecular docking simulation results. The test cases used a fully-flexible receptor (FFR) model of Mycobacterium tuberculosis InhA enzyme (FFR_InhA) and four different ligands.

RESULTS

We generated an initial set of attributes as well as their respective instances. To improve this initial set, we applied two selection strategies. The first was based on our context-based approach while the second used the CFS (Correlation-based Feature Selection) machine learning algorithm. Additionally, we produced an extra dataset containing features selected by combining our context strategy and the CFS algorithm. To demonstrate the effectiveness of the proposed method, we evaluated its performance based on various predictive (RMSE, MAE, Correlation, and Nodes) and context (Precision, Recall and FScore) measures.

CONCLUSIONS

Statistical analysis of the results shows that the proposed context-based data preprocessing approach significantly improves predictive and context measures and outperforms the CFS algorithm. Context-based data preprocessing improves mining results by producing superior interpretable models, which makes it well-suited for practical applications in molecular docking simulations using FFR models.

摘要

背景

数据预处理是数据挖掘的主要步骤。在数据预处理中,可以应用几种已知的技术,或者开发新的技术,以提高数据质量,从而使挖掘结果更加准确和可理解。生物信息学是一个对从大型数据集生成综合模型有很高需求的领域。在本文中,我们提出了一种基于上下文的数据预处理方法,用于从分子对接模拟结果中挖掘数据。测试案例使用了结核分枝杆菌 InhA 酶的完全柔性受体(FFR)模型(FFR_InhA)和四种不同的配体。

结果

我们生成了一组初始属性及其各自的实例。为了改进这个初始集,我们应用了两种选择策略。第一种是基于我们的基于上下文的方法,而第二种是使用基于相关性的特征选择(CFS)机器学习算法。此外,我们还生成了一个包含通过结合我们的上下文策略和 CFS 算法选择的特征的额外数据集。为了证明所提出方法的有效性,我们根据各种预测(RMSE、MAE、相关性和节点)和上下文(精度、召回率和 FScore)度量来评估其性能。

结论

对结果的统计分析表明,所提出的基于上下文的数据预处理方法显著提高了预测和上下文度量,并优于 CFS 算法。基于上下文的数据预处理通过生成更具解释性的模型来提高挖掘结果,这使其非常适合使用 FFR 模型进行分子对接模拟的实际应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1059/3909228/12b471acf75c/1471-2164-14-S6-S6-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1059/3909228/d0f581b087d4/1471-2164-14-S6-S6-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1059/3909228/12b471acf75c/1471-2164-14-S6-S6-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1059/3909228/d0f581b087d4/1471-2164-14-S6-S6-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1059/3909228/12b471acf75c/1471-2164-14-S6-S6-2.jpg

相似文献

1
Context-based preprocessing of molecular docking data.基于上下文的分子对接数据预处理。
BMC Genomics. 2013;14 Suppl 6(Suppl 6):S6. doi: 10.1186/1471-2164-14-S6-S6. Epub 2013 Oct 25.
2
Effect of the explicit flexibility of the InhA enzyme from Mycobacterium tuberculosis in molecular docking simulations.结核分枝杆菌 InhA 酶的显式柔性对分子对接模拟的影响。
BMC Genomics. 2011 Dec 22;12 Suppl 4(Suppl 4):S7. doi: 10.1186/1471-2164-12-S4-S7.
3
Mining flexible-receptor docking experiments to select promising protein receptor snapshots.挖掘柔性受体对接实验,以选择有前途的蛋白质受体快照。
BMC Genomics. 2010 Dec 22;11 Suppl 5(Suppl 5):S6. doi: 10.1186/1471-2164-11-S5-S6.
4
wFReDoW: a cloud-based web environment to handle molecular docking simulations of a fully flexible receptor model.wFReDoW:一个基于云计算的网页环境,用于处理完全柔性受体模型的分子对接模拟。
Biomed Res Int. 2013;2013:469363. doi: 10.1155/2013/469363. Epub 2013 Apr 11.
5
A selective method for optimizing ensemble docking-based experiments on an InhA Fully-Flexible receptor model.一种优化基于集合 docking 的实验的选择性方法,该实验基于 InhA 完全柔性受体模型。
BMC Bioinformatics. 2018 Jun 22;19(1):235. doi: 10.1186/s12859-018-2222-2.
6
Automatic design of decision-tree induction algorithms tailored to flexible-receptor docking data.自动设计决策树归纳算法,以适应柔性受体对接数据。
BMC Bioinformatics. 2012 Nov 21;13:310. doi: 10.1186/1471-2105-13-310.
7
Advances in Docking.对接技术的新进展。
Curr Med Chem. 2019;26(42):7555-7580. doi: 10.2174/0929867325666180904115000.
8
FReDoWS: a method to automate molecular docking simulations with explicit receptor flexibility and snapshots selection.FReDoWS:一种自动化分子对接模拟的方法,可实现明确的受体柔性和快照选择。
BMC Genomics. 2011 Dec 22;12 Suppl 4(Suppl 4):S6. doi: 10.1186/1471-2164-12-S4-S6.
9
Boosted neural networks scoring functions for accurate ligand docking and ranking.用于精确配体对接和排序的增强神经网络评分函数。
J Bioinform Comput Biol. 2018 Apr;16(2):1850004. doi: 10.1142/S021972001850004X. Epub 2018 Feb 4.
10
Discovery of new inhibitors of Mycobacterium tuberculosis InhA enzyme using virtual screening and a 3D-pharmacophore-based approach.使用虚拟筛选和基于 3D 药效团的方法发现结核分枝杆菌 InhA 酶的新抑制剂。
J Chem Inf Model. 2013 Sep 23;53(9):2390-401. doi: 10.1021/ci400202t. Epub 2013 Aug 20.

本文引用的文献

1
Integration of interactive, multi-scale network navigation approach with Cytoscape for functional genomics in the big data era.整合交互式、多尺度网络导航方法与 Cytoscape 在大数据时代进行功能基因组学研究。
BMC Genomics. 2012;13 Suppl 7(Suppl 7):S24. doi: 10.1186/1471-2164-13-S7-S24. Epub 2012 Dec 13.
2
Automatic design of decision-tree induction algorithms tailored to flexible-receptor docking data.自动设计决策树归纳算法,以适应柔性受体对接数据。
BMC Bioinformatics. 2012 Nov 21;13:310. doi: 10.1186/1471-2105-13-310.
3
FReDoWS: a method to automate molecular docking simulations with explicit receptor flexibility and snapshots selection.
FReDoWS:一种自动化分子对接模拟的方法,可实现明确的受体柔性和快照选择。
BMC Genomics. 2011 Dec 22;12 Suppl 4(Suppl 4):S6. doi: 10.1186/1471-2164-12-S4-S6.
4
Mining flexible-receptor docking experiments to select promising protein receptor snapshots.挖掘柔性受体对接实验,以选择有前途的蛋白质受体快照。
BMC Genomics. 2010 Dec 22;11 Suppl 5(Suppl 5):S6. doi: 10.1186/1471-2164-11-S5-S6.
5
On the importance of comprehensible classification models for protein function prediction.论可理解的分类模型对于蛋白质功能预测的重要性。
IEEE/ACM Trans Comput Biol Bioinform. 2010 Jan-Mar;7(1):172-82. doi: 10.1109/TCBB.2008.47.
6
Protein cutoff scanning: A comparative analysis of cutoff dependent and cutoff free methods for prospecting contacts in proteins.蛋白质截止值扫描:蛋白质中潜在接触的依赖截止值方法与无截止值方法的比较分析。
Proteins. 2009 Feb 15;74(3):727-43. doi: 10.1002/prot.22187.
7
Mechanism of thioamide drug action against tuberculosis and leprosy.硫代酰胺类药物抗结核和抗麻风病的作用机制。
J Exp Med. 2007 Jan 22;204(1):73-8. doi: 10.1084/jem.20062100. Epub 2007 Jan 16.
8
Ensemble docking of multiple protein structures: considering protein structural variations in molecular docking.多个蛋白质结构的整合对接:在分子对接中考虑蛋白质结构变异
Proteins. 2007 Feb 1;66(2):399-421. doi: 10.1002/prot.21214.
9
Application of a Theory of Enzyme Specificity to Protein Synthesis.酶特异性理论在蛋白质合成中的应用。
Proc Natl Acad Sci U S A. 1958 Feb;44(2):98-104. doi: 10.1073/pnas.44.2.98.
10
Molecular dynamics simulation studies of the wild-type, I21V, and I16T mutants of isoniazid-resistant Mycobacterium tuberculosis enoyl reductase (InhA) in complex with NADH: toward the understanding of NADH-InhA different affinities.耐异烟肼结核分枝杆菌烯酰还原酶(InhA)野生型、I21V和I16T突变体与NADH复合物的分子动力学模拟研究:旨在理解NADH与InhA的不同亲和力
Biophys J. 2005 Aug;89(2):876-84. doi: 10.1529/biophysj.104.053512. Epub 2005 May 20.