Suppr超能文献

一种用于选择最佳预测因子的新方法,以鉴定 DNA 结合蛋白中的结合位点。

Novel approach for selecting the best predictor for identifying the binding sites in DNA binding proteins.

机构信息

Department of Biotechnology, Indian Institute of Technology Madras, Chennai 600036, India and National Institute of Biomedical Innovation, Osaka, Japan.

出版信息

Nucleic Acids Res. 2013 Sep;41(16):7606-14. doi: 10.1093/nar/gkt544. Epub 2013 Jun 20.

Abstract

Protein-DNA complexes play vital roles in many cellular processes by the interactions of amino acids with DNA. Several computational methods have been developed for predicting the interacting residues in DNA-binding proteins using sequence and/or structural information. These methods showed different levels of accuracies, which may depend on the choice of data sets used in training, the feature sets selected for developing a predictive model, the ability of the models to capture information useful for prediction or a combination of these factors. In many cases, different methods are likely to produce similar results, whereas in others, the predictors may return contradictory predictions. In this situation, a priori estimates of prediction performance applicable to the system being investigated would be helpful for biologists to choose the best method for designing their experiments. In this work, we have constructed unbiased, stringent and diverse data sets for DNA-binding proteins based on various biologically relevant considerations: (i) seven structural classes, (ii) 86 folds, (iii) 106 superfamilies, (iv) 194 families, (v) 15 binding motifs, (vi) single/double-stranded DNA, (vii) DNA conformation (A, B, Z, etc.), (viii) three functions and (ix) disordered regions. These data sets were culled as non-redundant with sequence identities of 25 and 40% and used to evaluate the performance of 11 different methods in which online services or standalone programs are available. We observed that the best performing methods for each of the data sets showed significant biases toward the data sets selected for their benchmark. Our analysis revealed important data set features, which could be used to estimate these context-specific biases and hence suggest the best method to be used for a given problem. We have developed a web server, which considers these features on demand and displays the best method that the investigator should use. The web server is freely available at http://www.biotech.iitm.ac.in/DNA-protein/. Further, we have grouped the methods based on their complexity and analyzed the performance. The information gained in this work could be effectively used to select the best method for designing experiments.

摘要

蛋白质与 DNA 的相互作用在许多细胞过程中起着至关重要的作用。已经开发了几种计算方法,用于使用序列和/或结构信息预测 DNA 结合蛋白中的相互作用残基。这些方法的准确性不同,这可能取决于用于训练的数据集的选择、用于开发预测模型的特征集、模型捕获对预测有用的信息的能力或这些因素的组合。在许多情况下,不同的方法可能会产生相似的结果,而在其他情况下,预测器可能会返回相互矛盾的预测。在这种情况下,适用于正在研究的系统的预测性能的先验估计将有助于生物学家选择最佳方法来设计他们的实验。在这项工作中,我们根据各种与生物学相关的考虑因素构建了无偏、严格和多样化的 DNA 结合蛋白数据集:(i)七种结构类别,(ii)86 种折叠,(iii)106 个超家族,(iv)194 个家族,(v)15 个结合基序,(vi)单/双链 DNA,(vii)DNA 构象(A、B、Z 等),(viii)三种功能和(ix)无序区域。这些数据集是无冗余的,序列同一性为 25%和 40%,并用于评估 11 种不同方法的性能,其中在线服务或独立程序可用。我们观察到,对于每个数据集,表现最好的方法都明显偏向于为其基准选择的数据集。我们的分析揭示了数据集的重要特征,这些特征可用于估计这些特定于上下文的偏差,并因此建议在给定问题中使用的最佳方法。我们开发了一个 Web 服务器,该服务器会根据需要考虑这些特征,并显示研究人员应该使用的最佳方法。该 Web 服务器可免费在 http://www.biotech.iitm.ac.in/DNA-protein/ 上获得。此外,我们还根据方法的复杂性对其进行了分组,并分析了性能。这项工作中获得的信息可以有效地用于选择用于设计实验的最佳方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b129/3763535/f81da25e4f48/gkt544f1p.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验