Suppr超能文献

复杂基因组数据中的机器学习与数据挖掘——遗传分析研讨会19的经验教训综述

Machine learning and data mining in complex genomic data--a review on the lessons learned in Genetic Analysis Workshop 19.

作者信息

König Inke R, Auerbach Jonathan, Gola Damian, Held Elizabeth, Holzinger Emily R, Legault Marc-André, Sun Rui, Tintle Nathan, Yang Hsin-Chou

机构信息

Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Lübeck, Germany.

Department of Statistics, Columbia University, New York, NY, 10027, USA.

出版信息

BMC Genet. 2016 Feb 3;17 Suppl 2(Suppl 2):1. doi: 10.1186/s12863-015-0315-8.

Abstract

In the analysis of current genomic data, application of machine learning and data mining techniques has become more attractive given the rising complexity of the projects. As part of the Genetic Analysis Workshop 19, approaches from this domain were explored, mostly motivated from two starting points. First, assuming an underlying structure in the genomic data, data mining might identify this and thus improve downstream association analyses. Second, computational methods for machine learning need to be developed further to efficiently deal with the current wealth of data.In the course of discussing results and experiences from the machine learning and data mining approaches, six common messages were extracted. These depict the current state of these approaches in the application to complex genomic data. Although some challenges remain for future studies, important forward steps were taken in the integration of different data types and the evaluation of the evidence. Mining the data for underlying genetic or phenotypic structure and using this information in subsequent analyses proved to be extremely helpful and is likely to become of even greater use with more complex data sets.

摘要

在当前基因组数据分析中,鉴于项目复杂性不断增加,机器学习和数据挖掘技术的应用变得更具吸引力。作为遗传分析研讨会19的一部分,探讨了该领域的方法,主要基于两个出发点。首先,假设基因组数据存在潜在结构,数据挖掘可能会识别出这种结构,从而改进下游关联分析。其次,机器学习的计算方法需要进一步发展,以有效处理当前丰富的数据。在讨论机器学习和数据挖掘方法的结果及经验过程中,提取了六条共同信息。这些信息描述了这些方法在应用于复杂基因组数据时的当前状态。尽管未来研究仍面临一些挑战,但在整合不同数据类型和评估证据方面已迈出重要的前进步伐。挖掘数据以寻找潜在的遗传或表型结构,并在后续分析中使用这些信息,已证明非常有帮助,并且随着数据集变得更加复杂,可能会发挥更大的作用。

相似文献

2
Introducing Machine Learning Concepts with WEKA.使用WEKA介绍机器学习概念。
Methods Mol Biol. 2016;1418:353-78. doi: 10.1007/978-1-4939-3578-9_17.
3
A Review on Methods for Detecting SNP Interactions in High-Dimensional Genomic Data.高维基因组数据中 SNP 相互作用检测方法的研究综述。
IEEE/ACM Trans Comput Biol Bioinform. 2018 Mar-Apr;15(2):599-612. doi: 10.1109/TCBB.2016.2635125. Epub 2016 Dec 2.
7
Machine learning and graph analytics in computational biomedicine.计算生物医学中的机器学习与图形分析
Artif Intell Med. 2017 Nov;83:1. doi: 10.1016/j.artmed.2017.09.003. Epub 2017 Sep 7.

引用本文的文献

本文引用的文献

1
Homozygosity disequilibrium and its gene regulation.纯合性不平衡及其基因调控。
BMC Proc. 2016 Oct 18;10(Suppl 7):159-163. doi: 10.1186/s12919-016-0023-z. eCollection 2016.
8
Analysis of homozygosity disequilibrium using whole-genome sequencing data.利用全基因组测序数据进行纯合性不平衡分析。
BMC Proc. 2014 Jun 17;8(Suppl 1 Genetic Analysis Workshop 18Vanessa Olmo):S15. doi: 10.1186/1753-6561-8-S1-S15. eCollection 2014.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验