Wassermann Anne Mai, Nisius Britta, Vogt Martin, Bajorath Jürgen
Department of Chemical Biology, University of Bonn, Bonn, Germany.
Methods Mol Biol. 2012;819:43-55. doi: 10.1007/978-1-61779-465-0_4.
The identification of molecular descriptors that are able to distinguish between different compound classes is of paramount importance in chemoinformatics. To aid in the identification of such discriminatory descriptors, concepts from information theory have been adapted. In an earlier study, an approach termed Differential Shannon Entropy (DSE) has been introduced for descriptor profiling to detect and quantify compound database-dependent differences in the information content and value range distribution of descriptors. Because the DSE approach was intrinsically limited in its ability to select compound class-specific descriptors by comparing data sets of very different size, this approach has recently been extended to Mutual Information-DSE (MI-DSE). Herein, DSE, MI-DSE, and the Shannon entropy concept underlying both information theoretic approaches are introduced and compared, and differences between their application areas are discussed.
在化学信息学中,识别能够区分不同化合物类别的分子描述符至关重要。为了帮助识别此类具有区分性的描述符,人们采用了信息论中的概念。在早期的一项研究中,引入了一种称为差分香农熵(DSE)的方法用于描述符分析,以检测和量化描述符在信息内容和值范围分布方面依赖于化合物数据库的差异。由于DSE方法在通过比较大小差异很大的数据集来选择特定化合物类别的描述符方面存在内在局限性,该方法最近已扩展为互信息 - DSE(MI - DSE)。本文介绍并比较了DSE、MI - DSE以及这两种信息论方法所基于的香农熵概念,并讨论了它们应用领域的差异。