Suppr超能文献

编码重要吗?关于数量遗传性状预测问题的新观点。

Does encoding matter? A novel view on the quantitative genetic trait prediction problem.

作者信息

He Dan, Parida Laxmi

机构信息

IBM T.J. Watson Research, Yorktown Heights, NY, USA.

出版信息

BMC Bioinformatics. 2016 Jul 19;17 Suppl 9(Suppl 9):272. doi: 10.1186/s12859-016-1127-1.

Abstract

BACKGROUND

Given a set of biallelic molecular markers, such as SNPs, with genotype values encoded numerically on a collection of plant, animal or human samples, the goal of genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Genetic trait prediction is usually represented as linear regression models which require quantitative encodings for the genotypes. There are lots of work on the prediction algorithms, but none of the existing work investigated the effects of the encodings on the genetic trait prediction problem.

METHODS

In this work, we view the genetic trait prediction problem from a novel angle: a multiple regression on categorical data problem, which requires encoding the categorical data into numerical data. We further proposed two novel encoding methods and we show that they are able to generate numerical features with higher predictive power.

RESULTS AND DISCUSSION

Our experiments show that our methods are superior to the other encoding methods for both single marker model and epistasis model. We showed that the quantitative genetic trait prediction problem heavily depends on the encoding of genotypes, for both single marker model and epistasis model.

CONCLUSIONS

We conducted a detailed analysis on the performance of the hybrid encodings. To our knowledge, this is the first work that discusses the effects of encodings for genetic trait prediction problem.

摘要

背景

给定一组双等位基因分子标记,如单核苷酸多态性(SNPs),其基因型值在植物、动物或人类样本集合上进行数字编码,遗传性状预测的目标是通过同时对所有标记效应进行建模来预测数量性状值。遗传性状预测通常表示为线性回归模型,这需要对基因型进行定量编码。在预测算法方面有很多工作,但现有的工作都没有研究编码对遗传性状预测问题的影响。

方法

在这项工作中,我们从一个新颖的角度看待遗传性状预测问题:一个关于分类数据的多元回归问题,这需要将分类数据编码为数值数据。我们进一步提出了两种新颖的编码方法,并表明它们能够生成具有更高预测能力的数值特征。

结果与讨论

我们的实验表明,对于单标记模型和上位性模型,我们的方法都优于其他编码方法。我们表明,对于单标记模型和上位性模型,数量遗传性状预测问题在很大程度上取决于基因型的编码。

结论

我们对混合编码的性能进行了详细分析。据我们所知,这是第一项讨论编码对遗传性状预测问题影响的工作。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5681/4959353/24da2f93122f/12859_2016_1127_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验