Gallo Cristian A, Cecchini Rocio L, Carballido Jessica A, Micheletto Sandra, Ponzoni Ignacio
Brief Bioinform. 2016 Sep;17(5):758-70. doi: 10.1093/bib/bbv074. Epub 2015 Sep 22.
Gene expression measurements represent the most important source of biological data used to unveil the interaction and functionality of genes. In this regard, several data mining and machine learning algorithms have been proposed that require, in a number of cases, some kind of data discretization to perform the inference. Selection of an appropriate discretization process has a major impact on the design and outcome of the inference algorithms, as there are a number of relevant issues that need to be considered. This study presents a revision of the current state-of-the-art discretization techniques, together with the key subjects that need to be considered when designing or selecting a discretization approach for gene expression data.
基因表达测量是用于揭示基因相互作用和功能的最重要生物数据来源。在这方面,已经提出了几种数据挖掘和机器学习算法,在许多情况下,这些算法需要某种数据离散化来进行推理。选择合适的离散化过程对推理算法的设计和结果有重大影响,因为有许多相关问题需要考虑。本研究对当前最先进的离散化技术进行了综述,并介绍了在为基因表达数据设计或选择离散化方法时需要考虑的关键问题。