University of Basel, Psychiatric University Clinic, Basel, Switzerland.
Expert Opin Drug Discov. 2012 Apr;7(4):341-52. doi: 10.1517/17460441.2012.668182. Epub 2012 Feb 29.
Decision tree induction (DTI) is a powerful means of modeling data without much prior preparation. Models are readable by humans, robust and easily applied in real-world applications, features that are mutually exclusive in other commonly used machine learning paradigms. While DTI is widely used in disciplines ranging from economics to medicine, they are an intriguing option in pharmaceutical research, especially when dealing with large data stores.
This review covers the automated technologies available for creating decision trees and other rules efficiently, even from large datasets such as chemical libraries. The authors discuss the need for properly documented and validated models. Lastly, the authors cover several case studies in hit discovery, drug metabolism and toxicology, and drug surveillance, and compare them with other established techniques.
DTI is a competitive and easy-to-use tool in basic research as well as in hit and drug discovery. Its strengths lie in its ability to handle all sorts of different data formats, the visual nature of the models, and the small computational effort needed for implementation in real-world systems. Limitations include lack of robustness and over-fitted models for certain types of data. As with any modeling technique, proper validation and quality measures are of utmost importance.
决策树归纳(DTI)是一种无需大量前期准备即可对数据进行建模的强大手段。模型易于人类阅读、稳健且易于在实际应用中应用,这些功能在其他常用的机器学习范例中是相互排斥的。尽管 DTI 在经济学到医学等学科中得到了广泛应用,但在药物研究中,尤其是在处理大型数据存储时,它是一个有趣的选择。
这篇综述涵盖了用于从大型数据集(如化学库)中高效创建决策树和其他规则的自动化技术。作者讨论了对经过适当记录和验证的模型的需求。最后,作者涵盖了在发现命中、药物代谢和毒理学以及药物监测方面的几个案例研究,并将它们与其他已建立的技术进行了比较。
DTI 在基础研究以及命中和药物发现中是一种具有竞争力且易于使用的工具。它的优势在于它能够处理各种不同的数据格式、模型的可视化性质以及在实际系统中实现所需的小计算工作量。其局限性包括缺乏稳健性和针对某些类型数据的过拟合模型。与任何建模技术一样,适当的验证和质量措施至关重要。