多任务学习和多输出回归在多基因性状预测中的新应用。

Novel applications of multitask learning and multiple output regression to multiple genetic trait prediction.

作者信息

He Dan, Kuhn David, Parida Laxmi

机构信息

IBM T.J. Watson Research, Yorktown Heights, NY, USA.

USDA-ARS Subtropical Horticultural Research Station, Miami, FL, USA.

出版信息

Bioinformatics. 2016 Jun 15;32(12):i37-i43. doi: 10.1093/bioinformatics/btw249.

DOI:10.1093/bioinformatics/btw249

PMID:27307640

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4908333/

Abstract

UNLABELLED

Given a set of biallelic molecular markers, such as SNPs, with genotype values encoded numerically on a collection of plant, animal or human samples, the goal of genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Genetic trait prediction is usually represented as linear regression models. In many cases, for the same set of samples and markers, multiple traits are observed. Some of these traits might be correlated with each other. Therefore, modeling all the multiple traits together may improve the prediction accuracy. In this work, we view the multitrait prediction problem from a machine learning angle: as either a multitask learning problem or a multiple output regression problem, depending on whether different traits share the same genotype matrix or not. We then adapted multitask learning algorithms and multiple output regression algorithms to solve the multitrait prediction problem. We proposed a few strategies to improve the least square error of the prediction from these algorithms. Our experiments show that modeling multiple traits together could improve the prediction accuracy for correlated traits.

AVAILABILITY AND IMPLEMENTATION

The programs we used are either public or directly from the referred authors, such as MALSAR (http://www.public.asu.edu/~jye02/Software/MALSAR/) package. The Avocado data set has not been published yet and is available upon request.

CONTACT

dhe@us.ibm.com.

摘要

未标注

给定一组双等位基因分子标记，如单核苷酸多态性（SNP），其基因型值在一系列植物、动物或人类样本上进行数字编码，遗传性状预测的目标是通过同时对所有标记效应进行建模来预测数量性状值。遗传性状预测通常表示为线性回归模型。在许多情况下，对于同一组样本和标记，会观察到多个性状。其中一些性状可能相互关联。因此，对所有多个性状一起建模可能会提高预测准确性。在这项工作中，我们从机器学习的角度看待多性状预测问题：根据不同性状是否共享相同的基因型矩阵，将其视为多任务学习问题或多输出回归问题。然后，我们采用多任务学习算法和多输出回归算法来解决多性状预测问题。我们提出了一些策略来提高这些算法预测的最小二乘误差。我们的实验表明，对多个性状一起建模可以提高相关性状的预测准确性。

可用性和实现

我们使用的程序要么是公开的，要么直接来自参考文献的作者，如MALSAR（http://www.public.asu.edu/~jye02/Software/MALSAR/）软件包。鳄梨数据集尚未发表，可根据要求提供。

联系方式

dhe@us.ibm.com。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/635e/4908333/914bd46b81cd/btw249f1p.jpg

相似文献

Novel applications of multitask learning and multiple output regression to multiple genetic trait prediction.多任务学习和多输出回归在多基因性状预测中的新应用。

Bioinformatics. 2016 Jun 15;32(12):i37-i43. doi: 10.1093/bioinformatics/btw249.

Does encoding matter? A novel view on the quantitative genetic trait prediction problem.编码重要吗？关于数量遗传性状预测问题的新观点。

BMC Bioinformatics. 2016 Jul 19;17 Suppl 9(Suppl 9):272. doi: 10.1186/s12859-016-1127-1.

Data-driven encoding for quantitative genetic trait prediction.基于数据驱动的定量遗传性状预测编码。

BMC Bioinformatics. 2015;16 Suppl 1(Suppl 1):S10. doi: 10.1186/1471-2105-16-S1-S10. Epub 2015 Feb 18.

Multibreed genomic prediction using multitrait genomic residual maximum likelihood and multitask Bayesian variable selection.多品种基因组预测使用多性状基因组残差极大似然法和多任务贝叶斯变量选择。

J Dairy Sci. 2018 May;101(5):4279-4294. doi: 10.3168/jds.2017-13366. Epub 2018 Mar 15.

Leveraging input and output structures for joint mapping of epistatic and marginal eQTLs.利用输入和输出结构进行上位性和边缘性 eQTL 的联合映射。

Bioinformatics. 2012 Jun 15;28(12):i137-46. doi: 10.1093/bioinformatics/bts227.

Accounting for trait architecture in genomic predictions of US Holstein cattle using a weighted realized relationship matrix.利用加权实现关系矩阵对美国荷斯坦奶牛的基因组预测进行性状结构分析。

Genet Sel Evol. 2015 Apr 2;47(1):24. doi: 10.1186/s12711-015-0100-1.

Joint prediction of multiple quantitative traits using a Bayesian multivariate antedependence model.使用贝叶斯多元前相依模型对多个数量性状进行联合预测。

Heredity (Edinb). 2015 Jul;115(1):29-36. doi: 10.1038/hdy.2015.9. Epub 2015 Apr 15.

Accuracy of prediction of simulated polygenic phenotypes and their underlying quantitative trait loci genotypes using real or imputed whole-genome markers in cattle.利用真实或推算的全基因组标记预测牛模拟多基因表型及其潜在数量性状位点基因型的准确性。

Genet Sel Evol. 2015 Dec 23;47:99. doi: 10.1186/s12711-015-0179-4.

A multi-trait Bayesian method for mapping QTL and genomic prediction.一种用于 QTL 作图和基因组预测的多性状贝叶斯方法。

Genet Sel Evol. 2018 Mar 24;50(1):10. doi: 10.1186/s12711-018-0377-y.

Effects of number of training generations on genomic prediction for various traits in a layer chicken population.训练世代数对蛋鸡群体中各种性状基因组预测的影响。

Genet Sel Evol. 2016 Mar 19;48:22. doi: 10.1186/s12711-016-0198-9.

引用本文的文献

Advances in multi-trait genomic prediction approaches: classification, comparative analysis, and perspectives.多性状基因组预测方法的进展：分类、比较分析及展望

Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf211.

CropGS-Hub: a comprehensive database of genotype and phenotype resources for genomic prediction in major crops.作物 GS-Hub：主要作物基因组预测中基因型和表型资源的综合数据库。

Nucleic Acids Res. 2024 Jan 5;52(D1):D1519-D1529. doi: 10.1093/nar/gkad1062.

Partial least squares enhance multi-trait genomic prediction of potato cultivars in new environments.偏最小二乘法增强了新环境下马铃薯品种的多性状基因组预测。

Sci Rep. 2023 Jun 19;13(1):9947. doi: 10.1038/s41598-023-37169-y.

(Quasi) multitask support vector regression with heuristic hyperparameter optimization for whole-genome prediction of complex traits: a case study with carcass traits in broilers.基于启发式超参数优化的（准）多任务支持向量回归在复杂性状全基因组预测中的应用：以肉鸡胴体性状为例的研究

G3 (Bethesda). 2023 Aug 9;13(8). doi: 10.1093/g3journal/jkad109.

Modeling genotype × environment interaction for single and multitrait genomic prediction in potato (Solanum tuberosum L.).马铃薯（Solanum tuberosum L.）单性状和多性状基因组预测中基因型与环境互作的建模。

G3 (Bethesda). 2023 Feb 9;13(2). doi: 10.1093/g3journal/jkac322.

Multi-trait genome prediction of new environments with partial least squares.利用偏最小二乘法对新环境进行多性状基因组预测。

Front Genet. 2022 Sep 5;13:966775. doi: 10.3389/fgene.2022.966775. eCollection 2022.

Accounting for Correlation Between Traits in Genomic Prediction.基因组预测中性状间相关性的考量

Methods Mol Biol. 2022;2467:285-327. doi: 10.1007/978-1-0716-2205-6_10.

Multi-Trait Multi-Environment Genomic Prediction for End-Use Quality Traits in Winter Wheat.冬小麦最终用途品质性状的多性状多环境基因组预测

Front Genet. 2022 Jan 31;13:831020. doi: 10.3389/fgene.2022.831020. eCollection 2022.

Bayesian multitrait kernel methods improve multienvironment genome-based prediction.贝叶斯多性状核方法可提高多环境基于基因组的预测。

G3 (Bethesda). 2022 Feb 4;12(2). doi: 10.1093/g3journal/jkab406.

Accounting for epistasis improves genomic prediction of phenotypes with univariate and bivariate models across environments.在单变量和双变量模型中，考虑上位性可提高表型的基因组预测在不同环境下的准确性。

Theor Appl Genet. 2021 Sep;134(9):2913-2930. doi: 10.1007/s00122-021-03868-1. Epub 2021 Jun 11.

本文引用的文献

Variable-Selection Emerges on Top in Empirical Comparison of Whole-Genome Complex-Trait Prediction Methods.在全基因组复杂性状预测方法的实证比较中，变量选择脱颖而出。

PLoS One. 2015 Oct 6;10(10):e0138903. doi: 10.1371/journal.pone.0138903. eCollection 2015.

Joint prediction of multiple quantitative traits using a Bayesian multivariate antedependence model.使用贝叶斯多元前相依模型对多个数量性状进行联合预测。

Heredity (Edinb). 2015 Jul;115(1):29-36. doi: 10.1038/hdy.2015.9. Epub 2015 Apr 15.

Clustered Multi-Task Learning Via Alternating Structure Optimization.通过交替结构优化实现聚类多任务学习

Adv Neural Inf Process Syst. 2011;2011:702-710.

Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks.从多个任务中学习非相干稀疏和低秩模式。

ACM Trans Knowl Discov Data. 2012 Feb 1;5(4):22. doi: 10.1145/2086737.2086742.

Genomic best linear unbiased prediction (gBLUP) for the estimation of genomic breeding values.用于估计基因组育种值的基因组最佳线性无偏预测（gBLUP）。

Methods Mol Biol. 2013;1019:321-30. doi: 10.1007/978-1-62703-447-0_13.

Multiple-trait genomic selection methods increase genetic value prediction accuracy.多性状基因组选择方法提高遗传值预测准确性。

Genetics. 2012 Dec;192(4):1513-22. doi: 10.1534/genetics.112.144246. Epub 2012 Oct 19.

Maximizing the reliability of genomic selection by optimizing the calibration set of reference individuals: comparison of methods in two diverse groups of maize inbreds (Zea mays L.).通过优化参考个体的校准集来提高基因组选择的可靠性：两种不同群体的玉米自交系（Zea mays L.）中的方法比较。

Genetics. 2012 Oct;192(2):715-28. doi: 10.1534/genetics.112.141473. Epub 2012 Aug 3.

A common dataset for genomic analysis of livestock populations.一个用于家畜群体基因组分析的常见数据集。

G3 (Bethesda). 2012 Apr;2(4):429-35. doi: 10.1534/g3.111.001453. Epub 2012 Apr 1.

Improved Lasso for genomic selection.用于基因组选择的改进套索法

Genet Res (Camb). 2011 Feb;93(1):77-87. doi: 10.1017/S0016672310000534. Epub 2010 Dec 14.

Genomic selection in plant breeding: from theory to practice.植物育种中的基因组选择：从理论到实践。

Brief Funct Genomics. 2010 Mar;9(2):166-77. doi: 10.1093/bfgp/elq001. Epub 2010 Feb 15.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

多任务学习和多输出回归在多基因性状预测中的新应用。

Novel applications of multitask learning and multiple output regression to multiple genetic trait prediction.

作者信息

机构信息

出版信息

UNLABELLED

AVAILABILITY AND IMPLEMENTATION

CONTACT

未标注

可用性和实现

联系方式

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献