Sukumar N, Krein Michael P, Embrechts Mark J
Rensselaer Exploratory Center for Cheminformatics Research and Department of Chemistry and Chemical Biology, Rensselaer Polytechnic Institute, Troy, NY, USA.
Methods Mol Biol. 2012;910:165-94. doi: 10.1007/978-1-61779-965-5_9.
The vast amounts of chemical and biological data available through robotic high-throughput assays and micro-array technologies require computational techniques for visualization, analysis, and predictive -modeling. Predictive cheminformatics and bioinformatics employ statistical methods to mine this data for hidden correlations and to retrieve molecules or genes with desirable biological activity from large databases, for the purpose of drug development. While many statistical methods are commonly employed and widely accessible, their proper use involves due consideration to data representation and preprocessing, model validation and domain of applicability estimation, similarity assessment, the nature of the structure-activity landscape, and model interpretation. This chapter seeks to review these considerations in light of the current state of the art in statistical modeling and to summarize the best practices in predictive cheminformatics.
通过机器人高通量分析和微阵列技术可获得大量化学和生物学数据,这需要计算技术来进行可视化、分析和预测建模。预测化学信息学和生物信息学采用统计方法挖掘这些数据以寻找隐藏的相关性,并从大型数据库中检索具有理想生物活性的分子或基因,用于药物开发。虽然许多统计方法被普遍使用且易于获取,但其正确使用需要适当考虑数据表示和预处理、模型验证以及适用性估计范围、相似性评估、结构-活性格局的性质和模型解释。本章旨在根据统计建模的当前技术水平回顾这些注意事项,并总结预测化学信息学的最佳实践。