• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基因组选择中用于选择单核苷酸多态性(SNP)的机器学习分类程序:在肉鸡早期死亡率中的应用

Machine learning classification procedure for selecting SNPs in genomic selection: application to early mortality in broilers.

作者信息

Long N, Gianola D, Rosa G J M, Weigel K A, Avendaño S

机构信息

Department of Animal Sciences, University of Wisconsin, Madison, WI 53706, USA.

出版信息

J Anim Breed Genet. 2007 Dec;124(6):377-89. doi: 10.1111/j.1439-0388.2007.00694.x.

DOI:10.1111/j.1439-0388.2007.00694.x
PMID:18076475
Abstract

Genome-wide association studies using single nucleotide polymorphisms (SNPs) can identify genetic variants related to complex traits. Typically thousands of SNPs are genotyped, whereas the number of phenotypes for which there is genomic information may be smaller. When predicting phenotypes, options for statistical model building range from incorporating all possible markers into the specification to including only sets of relevant SNPs (features). In the latter case, an efficient method of selecting influential features is required. A two-step feature selection method for binary traits was developed, which consisted of filtering (using information gain), and wrapping (using naïve Bayesian classification). The filter reduces the large number of SNPs to a much smaller size, to facilitate the wrapper step. As the procedure is tailored for discrete outcomes, an approach based on discretization of phenotypic values was developed, to enable feature selection in a classification framework. The method was applied to chick mortality rates (0-14 days of age) on progeny from 201 sires in a commercial broiler line, with the goal of identifying SNPs (over 5000) related to progeny mortality. To mimic a case-control study, sires were clustered into two groups, low and high, according to two arbitrarily chosen mortality rate cut points. By varying these thresholds, 11 different 'case-control' samples were formed, and the SNP selection procedure was applied to each sample. To compare the 11 sets of chosen SNPs, predicted residual sum of squares (PRESS) from a linear model was used. The two-step method improved naïve Bayesian classification accuracy over the case without feature selection (from around 50 to above 90% without and with feature selection in each case-control sample). The best case-control group (63 sires above or below the thresholds) had the smallest PRESS statistic among groups with model p-values below 0.003. The 17 SNPs selected using this group accounted for 31% of the variation in raw mortality rates between sire families.

摘要

使用单核苷酸多态性(SNP)进行全基因组关联研究可以识别与复杂性状相关的遗传变异。通常会对数千个SNP进行基因分型,而拥有基因组信息的表型数量可能较少。在预测表型时,统计模型构建的选项范围从将所有可能的标记纳入模型设定到仅包含相关SNP集(特征)。在后一种情况下,需要一种有效的方法来选择有影响力的特征。开发了一种用于二元性状的两步特征选择方法,该方法包括过滤(使用信息增益)和包装(使用朴素贝叶斯分类)。过滤器将大量的SNP减少到小得多的规模,以便于包装步骤。由于该程序是针对离散结果量身定制的,因此开发了一种基于表型值离散化的方法,以便在分类框架中进行特征选择。该方法应用于一个商业肉鸡品系中201个父系后代的雏鸡死亡率(0至14日龄),目的是识别与后代死亡率相关的SNP(超过5000个)。为了模拟病例对照研究,根据两个任意选择的死亡率切点将父系分为低和高两组。通过改变这些阈值,形成了11个不同的“病例对照”样本,并将SNP选择程序应用于每个样本。为了比较这11组选定的SNP,使用了线性模型的预测残差平方和(PRESS)。与没有特征选择的情况相比,两步法提高了朴素贝叶斯分类的准确性(在每个病例对照样本中,无特征选择时约为50%,有特征选择时高于90%)。最佳病例对照组(阈值上下各63个父系)在模型p值低于0.003的组中具有最小的PRESS统计量。使用该组选择的17个SNP占父系家族间原始死亡率差异的31%。

相似文献

1
Machine learning classification procedure for selecting SNPs in genomic selection: application to early mortality in broilers.基因组选择中用于选择单核苷酸多态性(SNP)的机器学习分类程序:在肉鸡早期死亡率中的应用
J Anim Breed Genet. 2007 Dec;124(6):377-89. doi: 10.1111/j.1439-0388.2007.00694.x.
2
Machine learning classification procedure for selecting SNPs in genomic selection: application to early mortality in broilers.基因组选择中用于选择单核苷酸多态性(SNP)的机器学习分类程序:在肉鸡早期死亡率中的应用
Dev Biol (Basel). 2008;132:373-376. doi: 10.1159/000317279.
3
Marker-assisted assessment of genotype by environment interaction: a case study of single nucleotide polymorphism-mortality association in broilers in two hygiene environments.基于标记辅助的基因型与环境互作评估:以两种卫生环境下肉鸡单核苷酸多态性与死亡率关联为例的研究
J Anim Sci. 2008 Dec;86(12):3358-66. doi: 10.2527/jas.2008-1021. Epub 2008 Sep 2.
4
A double classification tree search algorithm for index SNP selection.一种用于索引单核苷酸多态性(SNP)选择的双重分类树搜索算法。
BMC Bioinformatics. 2004 Jul 6;5:89. doi: 10.1186/1471-2105-5-89.
5
L2-Boosting algorithm applied to high-dimensional problems in genomic selection.应用于基因组选择中高维问题的L2增强算法。
Genet Res (Camb). 2010 Jun;92(3):227-37. doi: 10.1017/S0016672310000261.
6
Data mining and genetic algorithm based gene/SNP selection.基于数据挖掘和遗传算法的基因/单核苷酸多态性选择
Artif Intell Med. 2004 Jul;31(3):183-96. doi: 10.1016/j.artmed.2004.04.002.
7
SNP selection and classification of genome-wide SNP data using stratified sampling random forests.基于分层抽样随机森林的全基因组 SNP 数据 SNP 选择与分类。
IEEE Trans Nanobioscience. 2012 Sep;11(3):216-27. doi: 10.1109/TNB.2012.2214232.
8
BNTagger: improved tagging SNP selection using Bayesian networks.BNTagger:使用贝叶斯网络改进标签单核苷酸多态性选择
Bioinformatics. 2006 Jul 15;22(14):e211-9. doi: 10.1093/bioinformatics/btl233.
9
Identifying SNPs predictive of phenotype using random forests.使用随机森林识别预测表型的单核苷酸多态性
Genet Epidemiol. 2005 Feb;28(2):171-82. doi: 10.1002/gepi.20041.
10
Bayesian variable and model selection methods for genetic association studies.用于基因关联研究的贝叶斯变量与模型选择方法。
Genet Epidemiol. 2009 Jan;33(1):27-37. doi: 10.1002/gepi.20353.

引用本文的文献

1
Improving genomic prediction in pigs by integrating multi-population data and prior knowledge.通过整合多群体数据和先验知识改进猪的基因组预测
BMC Genomics. 2025 Aug 27;26(1):779. doi: 10.1186/s12864-025-12011-z.
2
Mutual information stacking method for prediction of the growth traits in pigs.用于预测猪生长性状的互信息堆叠方法
Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf231.
3
optRF: Optimising random forest stability by determining the optimal number of trees.optRF:通过确定最佳树的数量来优化随机森林稳定性。
BMC Bioinformatics. 2025 Mar 31;26(1):95. doi: 10.1186/s12859-025-06097-1.
4
Optimizing fully-efficient two-stage models for genomic selection using open-source software.使用开源软件优化用于基因组选择的全效两阶段模型。
Plant Methods. 2025 Feb 4;21(1):9. doi: 10.1186/s13007-024-01318-9.
5
An investigation of machine learning methods applied to genomic prediction in yellow-feathered broilers.应用于黄羽肉鸡基因组预测的机器学习方法研究。
Poult Sci. 2025 Jan;104(1):104489. doi: 10.1016/j.psj.2024.104489. Epub 2024 Nov 1.
6
Reviewing the definition of mortality in broiler chickens and its implications in genomic evaluations.审查肉鸡死亡率的定义及其在基因组评估中的意义。
J Anim Sci. 2024 Jan 3;102. doi: 10.1093/jas/skae190.
7
A divide-and-conquer approach for genomic prediction in rubber tree using machine learning.基于机器学习的橡胶树基因组预测的分而治之方法。
Sci Rep. 2022 Oct 26;12(1):18023. doi: 10.1038/s41598-022-20416-z.
8
Genome-Enabled Prediction Methods Based on Machine Learning.基于机器学习的基因组预测方法
Methods Mol Biol. 2022;2467:189-218. doi: 10.1007/978-1-0716-2205-6_7.
9
Machine learning approaches reveal genomic regions associated with sugarcane brown rust resistance.机器学习方法揭示与甘蔗褐色锈病抗性相关的基因组区域。
Sci Rep. 2020 Nov 18;10(1):20057. doi: 10.1038/s41598-020-77063-5.
10
High dimensional model representation of log likelihood ratio: binary classification with SNP data.高维模型表示对数似然比:基于 SNP 数据的二分类。
BMC Med Genomics. 2020 Sep 21;13(Suppl 9):133. doi: 10.1186/s12920-020-00774-1.