König Inke R, Auerbach Jonathan, Gola Damian, Held Elizabeth, Holzinger Emily R, Legault Marc-André, Sun Rui, Tintle Nathan, Yang Hsin-Chou
Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Lübeck, Germany.
Department of Statistics, Columbia University, New York, NY, 10027, USA.
BMC Genet. 2016 Feb 3;17 Suppl 2(Suppl 2):1. doi: 10.1186/s12863-015-0315-8.
In the analysis of current genomic data, application of machine learning and data mining techniques has become more attractive given the rising complexity of the projects. As part of the Genetic Analysis Workshop 19, approaches from this domain were explored, mostly motivated from two starting points. First, assuming an underlying structure in the genomic data, data mining might identify this and thus improve downstream association analyses. Second, computational methods for machine learning need to be developed further to efficiently deal with the current wealth of data.In the course of discussing results and experiences from the machine learning and data mining approaches, six common messages were extracted. These depict the current state of these approaches in the application to complex genomic data. Although some challenges remain for future studies, important forward steps were taken in the integration of different data types and the evaluation of the evidence. Mining the data for underlying genetic or phenotypic structure and using this information in subsequent analyses proved to be extremely helpful and is likely to become of even greater use with more complex data sets.
在当前基因组数据分析中,鉴于项目复杂性不断增加,机器学习和数据挖掘技术的应用变得更具吸引力。作为遗传分析研讨会19的一部分,探讨了该领域的方法,主要基于两个出发点。首先,假设基因组数据存在潜在结构,数据挖掘可能会识别出这种结构,从而改进下游关联分析。其次,机器学习的计算方法需要进一步发展,以有效处理当前丰富的数据。在讨论机器学习和数据挖掘方法的结果及经验过程中,提取了六条共同信息。这些信息描述了这些方法在应用于复杂基因组数据时的当前状态。尽管未来研究仍面临一些挑战,但在整合不同数据类型和评估证据方面已迈出重要的前进步伐。挖掘数据以寻找潜在的遗传或表型结构,并在后续分析中使用这些信息,已证明非常有帮助,并且随着数据集变得更加复杂,可能会发挥更大的作用。