Suppr超能文献

用于识别与奶牛剩余采食量相关的加性和上位性单核苷酸多态性的随机森林方法。

Random Forests approach for identifying additive and epistatic single nucleotide polymorphisms associated with residual feed intake in dairy cattle.

作者信息

Yao C, Spurlock D M, Armentano L E, Page C D, VandeHaar M J, Bickhart D M, Weigel K A

机构信息

Department of Dairy Science, University of Wisconsin, Madison 53706.

出版信息

J Dairy Sci. 2013 Oct;96(10):6716-29. doi: 10.3168/jds.2012-6237. Epub 2013 Aug 9.

Abstract

Feed efficiency is an economically important trait in the beef and dairy cattle industries. Residual feed intake (RFI) is a measure of partial efficiency that is independent of production level per unit of body weight. The objective of this study was to identify significant associations between single nucleotide polymorphism (SNP) markers and RFI in dairy cattle using the Random Forests (RF) algorithm. Genomic data included 42,275 SNP genotypes for 395 Holstein cows, whereas phenotypic measurements were daily RFI from 50 to 150 d postpartum. Residual feed intake was defined as the difference between an animal's feed intake and the average intake of its cohort, after adjustment for year and season of calving, year and season of measurement, age at calving nested within parity, days in milk, milk yield, body weight, and body weight change. Random Forests is a widely used machine-learning algorithm that has been applied to classification and regression problems. By analyzing the tree structures produced within RF, the 25 most frequent pairwise SNP interactions were reported as possible epistatic interactions. The importance scores that are generated by RF take into account both main effects of variables and interactions between variables, and the most negative value of all importance scores can be used as the cutoff level for declaring SNP effects as significant. Ranking by importance scores, 188 SNP surpassed the threshold, among which 38 SNP were mapped to RFI quantitative trait loci (QTL) regions reported in a previous study in beef cattle, and 2 SNP were also detected by a genome-wide association study in beef cattle. The ratio of number of SNP located in RFI QTL to the total number of SNP in the top 188 SNP chosen by RF was significantly higher than in all 42,275 whole-genome markers. Pathway analysis indicated that many of the top 188 SNP are in genomic regions that contain annotated genes with biological functions that may influence RFI. Frequently occurring ancestor-descendant SNP pairs can be explored as possible epistatic effects for further study. The importance scores generated by RF can be used effectively to identify large additive or epistatic SNP and informative QTL. The consistency in results of our study and previous studies in beef cattle indicates that the genetic architecture of RFI in dairy cattle might be similar to that of beef cattle.

摘要

饲料效率是肉牛和奶牛产业中一个具有重要经济意义的性状。剩余采食量(RFI)是衡量部分效率的一个指标,它独立于单位体重的生产水平。本研究的目的是使用随机森林(RF)算法确定奶牛单核苷酸多态性(SNP)标记与RFI之间的显著关联。基因组数据包括395头荷斯坦奶牛的42275个SNP基因型,而表型测量值是产后50至150天的每日RFI。剩余采食量定义为在对产犊年份和季节、测量年份和季节、胎次内的产犊年龄、泌乳天数、产奶量、体重和体重变化进行调整后,动物采食量与其同组动物平均采食量之间的差异。随机森林是一种广泛使用的机器学习算法,已应用于分类和回归问题。通过分析RF中产生的树结构,报告了25对最常见的SNP相互作用作为可能的上位性相互作用。RF生成的重要性得分既考虑了变量的主效应,也考虑了变量之间的相互作用,所有重要性得分中的最负值可作为将SNP效应声明为显著的截止水平。按重要性得分排序,188个SNP超过阈值,其中38个SNP被定位到先前肉牛研究中报道的RFI数量性状位点(QTL)区域,并且在肉牛的全基因组关联研究中也检测到2个SNP。位于RFI QTL中的SNP数量与RF选择的前188个SNP中的SNP总数之比显著高于所有42275个全基因组标记。通路分析表明,前188个SNP中的许多位于基因组区域,这些区域包含具有可能影响RFI的生物学功能的注释基因。频繁出现的祖先-后代SNP对可作为可能的上位性效应进行探索以供进一步研究。RF生成的重要性得分可有效地用于识别大的加性或上位性SNP以及信息丰富的QTL。我们的研究结果与先前肉牛研究结果的一致性表明,奶牛RFI的遗传结构可能与肉牛相似。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验