通过树状图和森林图将HIV-1序列变异与复制能力相关联。

Relating HIV-1 sequence variation to replication capacity via trees and forests.

作者信息

Segal Mark R, Barbour Jason D, Grant Robert M

机构信息

University of California, San Francisco, USA.

出版信息

Stat Appl Genet Mol Biol. 2004;3:Article2; discussion article 7, article 9. doi: 10.2202/1544-6115.1031. Epub 2004 Feb 12.

DOI:10.2202/1544-6115.1031

PMID:16646798

Abstract

The problem of relating genotype (as represented by amino acid sequence) to phenotypes is distinguished from standard regression problems by the nature of sequence data. Here we investigate an instance of such a problem where the phenotype of interest is HIV-1 replication capacity and contiguous segments of protease and reverse transcriptase sequence constitutes genotype. A variety of data analytic methods have been proposed in this context. Shortcomings of select techniques are contrasted with the advantages afforded by tree-structured methods. However, tree-structured methods, in turn, have been criticized on grounds of only enjoying modest predictive performance. A number of ensemble approaches (bagging, boosting, random forests) have recently emerged, devised to overcome this deficiency. We evaluate random forests as applied in this setting, and detail why prediction gains obtained in other situations are not realized. Other approaches including logic regression, support vector machines and neural networks are also applied. We interpret results in terms of HIV-1 reverse transcriptase structure and function.

摘要

将基因型（以氨基酸序列表示）与表型相关联的问题，因其序列数据的性质而有别于标准回归问题。在此，我们研究此类问题的一个实例，其中感兴趣的表型是HIV-1复制能力，蛋白酶和逆转录酶序列的连续片段构成基因型。在这种情况下，已经提出了多种数据分析方法。将所选技术的缺点与树结构方法的优势进行了对比。然而，树结构方法反过来也因仅具有适度的预测性能而受到批评。最近出现了一些集成方法（装袋法、提升法、随机森林法），旨在克服这一缺陷。我们评估了在此设置中应用的随机森林法，并详细说明了为何在其他情况下获得的预测增益无法实现。还应用了其他方法，包括逻辑回归、支持向量机和神经网络。我们根据HIV-1逆转录酶的结构和功能来解释结果。

相似文献

Relating HIV-1 sequence variation to replication capacity via trees and forests.

Stat Appl Genet Mol Biol. 2004;3:Article2; discussion article 7, article 9. doi: 10.2202/1544-6115.1031. Epub 2004 Feb 12.

Prediction of genomewide conserved epitope profiles of HIV-1: classifier choice and peptide representation.

Stat Appl Genet Mol Biol. 2005;4:Article25. doi: 10.2202/1544-6115.1158. Epub 2005 Sep 16.

Contemporary QSAR classifiers compared.

J Chem Inf Model. 2007 Jan-Feb;47(1):219-27. doi: 10.1021/ci600332j.

Genetic basis of variation in tenofovir drug susceptibility in HIV-1.

AIDS. 2008 Jun 19;22(10):1113-23. doi: 10.1097/QAD.0b013e32830184a1.

QSAR models for 2-amino-6-arylsulfonylbenzonitriles and congeners HIV-1 reverse transcriptase inhibitors based on linear and nonlinear regression methods.

Eur J Med Chem. 2009 May;44(5):2158-71. doi: 10.1016/j.ejmech.2008.10.021. Epub 2008 Oct 30.

Alternative methods to evaluate trial level surrogacy.

Clin Trials. 2008;5(3):194-208. doi: 10.1177/1740774508091677.

Advantages of predicted phenotypes and statistical learning models in inferring virological response to antiretroviral therapy from HIV genotype.

Antivir Ther. 2009;14(2):273-83.

Multiple testing and data adaptive regression: an application to HIV-1 sequence data.

Stat Appl Genet Mol Biol. 2005;4:Article8. doi: 10.2202/1544-6115.1110. Epub 2005 Apr 18.

A working guide to boosted regression trees.

J Anim Ecol. 2008 Jul;77(4):802-13. doi: 10.1111/j.1365-2656.2008.01390.x. Epub 2008 Apr 8.

Review on modelling aspects in reversed-phase liquid chromatographic quantitative structure-retention relationships.

Anal Chim Acta. 2007 Oct 29;602(2):164-72. doi: 10.1016/j.aca.2007.09.014. Epub 2007 Sep 15.

引用本文的文献

A brain and a head for a different habitat: Size variation in four morphs of Arctic charr ( (L.)) in a deep oligotrophic lake.

Ecol Evol. 2020 Sep 25;10(20):11335-11351. doi: 10.1002/ece3.6771. eCollection 2020 Oct.

Inference of clonal selection in cancer populations using single-cell sequencing data.

Bioinformatics. 2019 Jul 15;35(14):i398-i407. doi: 10.1093/bioinformatics/btz392.

Inferring genetic interactions from comparative fitness data.

Elife. 2017 Dec 20;6:e28629. doi: 10.7554/eLife.28629.

Factors Associated with HIV Testing Among Participants from Substance Use Disorder Treatment Programs in the US: A Machine Learning Approach.

AIDS Behav. 2017 Feb;21(2):534-546. doi: 10.1007/s10461-016-1628-y.

Supervised learning methods in modeling of CD4+ T cell heterogeneity.

BioData Min. 2015 Sep 4;8:27. doi: 10.1186/s13040-015-0060-6. eCollection 2015.

A framework for inferring fitness landscapes of patient-derived viruses using quasispecies theory.

Genetics. 2015 Jan;199(1):191-203. doi: 10.1534/genetics.114.172312. Epub 2014 Nov 17.

Estimating HIV-1 fitness characteristics from cross-sectional genotype data.

PLoS Comput Biol. 2014 Nov 6;10(11):e1003886. doi: 10.1371/journal.pcbi.1003886. eCollection 2014 Nov.

Integrative analysis using module-guided random forests reveals correlated genetic factors related to mouse weight.

PLoS Comput Biol. 2013;9(3):e1002956. doi: 10.1371/journal.pcbi.1002956. Epub 2013 Mar 7.

The peaks and geometry of fitness landscapes.

J Theor Biol. 2013 Jan 21;317:1-10. doi: 10.1016/j.jtbi.2012.09.028. Epub 2012 Oct 2.

Random forests for genomic data analysis.

Genomics. 2012 Jun;99(6):323-9. doi: 10.1016/j.ygeno.2012.04.003. Epub 2012 Apr 21.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过树状图和森林图将HIV-1序列变异与复制能力相关联。

Relating HIV-1 sequence variation to replication capacity via trees and forests.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献