• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

提升交替决策树对疾病特征信息的建模能力。

Boosting alternating decision trees modeling of disease trait information.

机构信息

HCNR Center for Bioinformatics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02215, USA.

出版信息

BMC Genet. 2005 Dec 30;6 Suppl 1(Suppl 1):S132. doi: 10.1186/1471-2156-6-S1-S132.

DOI:10.1186/1471-2156-6-S1-S132
PMID:16451591
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1866804/
Abstract

We applied the alternating decision trees (ADTrees) method to the last 3 replicates from the Aipotu, Danacca, Karangar, and NYC populations in the Problem 2 simulated Genetic Analysis Workshop dataset. Using information from the 12 binary phenotypes and sex as input and Kofendrerd Personality Disorder disease status as the outcome of ADTrees-based classifiers, we obtained a new quantitative trait based on average prediction scores, which was then used for genome-wide quantitative trait linkage (QTL) analysis. ADTrees are machine learning methods that combine boosting and decision trees algorithms to generate smaller and easier-to-interpret classification rules. In this application, we compared four modeling strategies from the combinations of two boosting iterations (log or exponential loss functions) coupled with two choices of tree generation types (a full alternating decision tree or a classic boosting decision tree). These four different strategies were applied to the founders in each population to construct four classifiers, which were then applied to each study participant. To compute average prediction score for each subject with a specific trait profile, such a process was repeated with 10 runs of 10-fold cross validation, and standardized prediction scores obtained from the 10 runs were averaged and used in subsequent expectation-maximization Haseman-Elston QTL analyses (implemented in GENEHUNTER) with the approximate 900 SNPs in Hardy-Weinberg equilibrium provided for each population. Our QTL analyses on the basis of four models (a full alternating decision tree and a classic boosting decision tree paired with either log or exponential loss function) detected evidence for linkage (Z >or= 1.96, p < 0.01) on chromosomes 1, 3, 5, and 9. Moreover, using average iteration and abundance scores for the 12 phenotypes and sex as their relevancy measurements, we found all relevant phenotypes for all four populations except phenotype b for the Karangar population, with suggested subgroup structure consistent with latent traits used in the model. In conclusion, our findings suggest that the ADTrees method may offer a more accurate representation of the disease status that allows for better detection of linkage evidence.

摘要

我们将交替决策树(ADTrees)方法应用于问题 2 模拟遗传分析研讨会数据集中 Aipotu、Danacca、Karangar 和 NYC 群体的最后 3 个重复。使用 12 个二元表型和性别信息作为输入,以 Kofendrerd 人格障碍疾病状态作为 ADTrees 分类器的输出,我们获得了一个新的基于平均预测分数的数量性状,然后用于全基因组数量性状连锁(QTL)分析。ADTrees 是一种机器学习方法,它结合了提升和决策树算法,生成更小、更易于解释的分类规则。在这种应用中,我们比较了四种来自两种提升迭代(对数或指数损失函数)与两种树生成类型(完全交替决策树或经典提升决策树)组合的建模策略。这四种不同的策略应用于每个群体的创始人,构建了四个分类器,然后将这些分类器应用于每个研究参与者。为了计算具有特定特征谱的每个个体的平均预测分数,我们通过 10 次 10 折交叉验证重复该过程,从 10 次运行中获得的标准化预测分数平均,并用于后续期望最大化 Haseman-Elston QTL 分析(在 GENEHUNTER 中实现),对于每个群体,提供了大约 900 个处于哈迪-温伯格平衡的 SNP。我们基于四种模型(完全交替决策树和经典提升决策树与对数或指数损失函数配对)的 QTL 分析检测到染色体 1、3、5 和 9 上存在连锁证据(Z≥1.96,p<0.01)。此外,使用 12 个表型和性别的平均迭代和丰度分数作为它们的相关性测量,我们发现了除了 Karangar 群体的表型 b 之外,所有四个群体的所有相关表型,具有与模型中使用的潜在特征一致的建议亚群结构。总之,我们的研究结果表明,ADTrees 方法可能提供了对疾病状态更准确的表示,从而更好地检测连锁证据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e731/1866804/9922ae93df49/1471-2156-6-S1-S132-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e731/1866804/677cebc44300/1471-2156-6-S1-S132-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e731/1866804/9922ae93df49/1471-2156-6-S1-S132-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e731/1866804/677cebc44300/1471-2156-6-S1-S132-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e731/1866804/9922ae93df49/1471-2156-6-S1-S132-2.jpg

相似文献

1
Boosting alternating decision trees modeling of disease trait information.提升交替决策树对疾病特征信息的建模能力。
BMC Genet. 2005 Dec 30;6 Suppl 1(Suppl 1):S132. doi: 10.1186/1471-2156-6-S1-S132.
2
Data mining of the GAW14 simulated data using rough set theory and tree-based methods.使用粗糙集理论和基于树的方法对 GAW14 模拟数据进行数据挖掘。
BMC Genet. 2005 Dec 30;6 Suppl 1(Suppl 1):S133. doi: 10.1186/1471-2156-6-S1-S133.
3
Modeling the effect of a genetic factor for a complex trait in a simulated population.在模拟人群中模拟复杂特征的遗传因素的影响。
BMC Genet. 2005 Dec 30;6 Suppl 1(Suppl 1):S87. doi: 10.1186/1471-2156-6-S1-S87.
4
Bootstrap aggregating of alternating decision trees to detect sets of SNPs that associate with disease.交替决策树的自举汇聚来检测与疾病相关的 SNP 集。
Genet Epidemiol. 2012 Feb;36(2):99-106. doi: 10.1002/gepi.21608.
5
The gradient boosting algorithm and random boosting for genome-assisted evaluation in large data sets.梯度提升算法和随机提升在大数据集中的基因组辅助评估中的应用。
J Dairy Sci. 2013 Jan;96(1):614-24. doi: 10.3168/jds.2012-5630. Epub 2012 Oct 24.
6
[Application of boosting-based decision tree ensemble classifiers for discrimination of thermophilic and mesophilic proteins].基于提升的决策树集成分类器在嗜热蛋白和嗜温蛋白鉴别中的应用
Sheng Wu Gong Cheng Xue Bao. 2006 Nov;22(6):1026-31.
7
A decision tree--based method for the differential diagnosis of Aortic Stenosis from Mitral Regurgitation using heart sounds.一种基于决策树的利用心音对主动脉瓣狭窄与二尖瓣反流进行鉴别诊断的方法。
Biomed Eng Online. 2004 Jun 29;3(1):21. doi: 10.1186/1475-925X-3-21.
8
MCMC-based linkage analysis for complex traits on general pedigrees: multipoint analysis with a two-locus model and a polygenic component.基于马尔可夫链蒙特卡罗方法的一般家系复杂性状连锁分析:双位点模型和多基因成分的多点分析
Genet Epidemiol. 2007 Feb;31(2):103-14. doi: 10.1002/gepi.20194.
9
Interval and composite interval mapping of somatic cell score, yield, and components of milk in dairy cattle.奶牛体细胞评分、产奶量及乳成分的区间和复合区间定位
J Dairy Sci. 2002 Nov;85(11):3081-91. doi: 10.3168/jds.S0022-0302(02)74395-6.
10
Bayesian prediction of breeding values for multivariate binary and continuous traits in simulated horse populations using threshold-linear models with Gibbs sampling.贝叶斯预测多元二项和连续性状的育种值在模拟马群使用门限线性模型与 Gibbs 抽样。
Animal. 2008 Jan;2(1):9-18. doi: 10.1017/S1751731107000912.

引用本文的文献

1
Novel GIS Based Machine Learning Algorithms for Shallow Landslide Susceptibility Mapping.基于新型 GIS 的浅层滑坡易发性制图机器学习算法。
Sensors (Basel). 2018 Nov 5;18(11):3777. doi: 10.3390/s18113777.
2
Land Subsidence Susceptibility Mapping in South Korea Using Machine Learning Algorithms.利用机器学习算法进行韩国地面沉降易发性制图
Sensors (Basel). 2018 Jul 31;18(8):2464. doi: 10.3390/s18082464.
3
Quantifying Risk for Anxiety Disorders in Preschool Children: A Machine Learning Approach.量化学龄前儿童焦虑症的风险:一种机器学习方法。

本文引用的文献

1
Predicting genetic regulatory response using classification.使用分类方法预测基因调控反应。
Bioinformatics. 2004 Aug 4;20 Suppl 1:i232-40. doi: 10.1093/bioinformatics/bth923.
2
Parametric and nonparametric linkage analysis: a unified multipoint approach.参数和非参数连锁分析:一种统一的多点方法。
Am J Hum Genet. 1996 Jun;58(6):1347-63.
PLoS One. 2016 Nov 23;11(11):e0165524. doi: 10.1371/journal.pone.0165524. eCollection 2016.
4
Bootstrap aggregating of alternating decision trees to detect sets of SNPs that associate with disease.交替决策树的自举汇聚来检测与疾病相关的 SNP 集。
Genet Epidemiol. 2012 Feb;36(2):99-106. doi: 10.1002/gepi.21608.
5
Standard linkage and association methods identify the mechanism of four susceptibility genes for a simulated complex disease.标准连锁和关联方法确定了四个模拟复杂疾病易感性基因的作用机制。
BMC Genet. 2005 Dec 30;6 Suppl 1(Suppl 1):S142. doi: 10.1186/1471-2156-6-S1-S142.