Suppr超能文献

多任务学习和多输出回归在多基因性状预测中的新应用。

Novel applications of multitask learning and multiple output regression to multiple genetic trait prediction.

作者信息

He Dan, Kuhn David, Parida Laxmi

机构信息

IBM T.J. Watson Research, Yorktown Heights, NY, USA.

USDA-ARS Subtropical Horticultural Research Station, Miami, FL, USA.

出版信息

Bioinformatics. 2016 Jun 15;32(12):i37-i43. doi: 10.1093/bioinformatics/btw249.

Abstract

UNLABELLED

Given a set of biallelic molecular markers, such as SNPs, with genotype values encoded numerically on a collection of plant, animal or human samples, the goal of genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Genetic trait prediction is usually represented as linear regression models. In many cases, for the same set of samples and markers, multiple traits are observed. Some of these traits might be correlated with each other. Therefore, modeling all the multiple traits together may improve the prediction accuracy. In this work, we view the multitrait prediction problem from a machine learning angle: as either a multitask learning problem or a multiple output regression problem, depending on whether different traits share the same genotype matrix or not. We then adapted multitask learning algorithms and multiple output regression algorithms to solve the multitrait prediction problem. We proposed a few strategies to improve the least square error of the prediction from these algorithms. Our experiments show that modeling multiple traits together could improve the prediction accuracy for correlated traits.

AVAILABILITY AND IMPLEMENTATION

The programs we used are either public or directly from the referred authors, such as MALSAR (http://www.public.asu.edu/~jye02/Software/MALSAR/) package. The Avocado data set has not been published yet and is available upon request.

CONTACT

dhe@us.ibm.com.

摘要

未标注

给定一组双等位基因分子标记,如单核苷酸多态性(SNP),其基因型值在一系列植物、动物或人类样本上进行数字编码,遗传性状预测的目标是通过同时对所有标记效应进行建模来预测数量性状值。遗传性状预测通常表示为线性回归模型。在许多情况下,对于同一组样本和标记,会观察到多个性状。其中一些性状可能相互关联。因此,对所有多个性状一起建模可能会提高预测准确性。在这项工作中,我们从机器学习的角度看待多性状预测问题:根据不同性状是否共享相同的基因型矩阵,将其视为多任务学习问题或多输出回归问题。然后,我们采用多任务学习算法和多输出回归算法来解决多性状预测问题。我们提出了一些策略来提高这些算法预测的最小二乘误差。我们的实验表明,对多个性状一起建模可以提高相关性状的预测准确性。

可用性和实现

我们使用的程序要么是公开的,要么直接来自参考文献的作者,如MALSAR(http://www.public.asu.edu/~jye02/Software/MALSAR/)软件包。鳄梨数据集尚未发表,可根据要求提供。

联系方式

dhe@us.ibm.com

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/635e/4908333/914bd46b81cd/btw249f1p.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验