Suppr超能文献

通过树状图和森林图将HIV-1序列变异与复制能力相关联。

Relating HIV-1 sequence variation to replication capacity via trees and forests.

作者信息

Segal Mark R, Barbour Jason D, Grant Robert M

机构信息

University of California, San Francisco, USA.

出版信息

Stat Appl Genet Mol Biol. 2004;3:Article2; discussion article 7, article 9. doi: 10.2202/1544-6115.1031. Epub 2004 Feb 12.

Abstract

The problem of relating genotype (as represented by amino acid sequence) to phenotypes is distinguished from standard regression problems by the nature of sequence data. Here we investigate an instance of such a problem where the phenotype of interest is HIV-1 replication capacity and contiguous segments of protease and reverse transcriptase sequence constitutes genotype. A variety of data analytic methods have been proposed in this context. Shortcomings of select techniques are contrasted with the advantages afforded by tree-structured methods. However, tree-structured methods, in turn, have been criticized on grounds of only enjoying modest predictive performance. A number of ensemble approaches (bagging, boosting, random forests) have recently emerged, devised to overcome this deficiency. We evaluate random forests as applied in this setting, and detail why prediction gains obtained in other situations are not realized. Other approaches including logic regression, support vector machines and neural networks are also applied. We interpret results in terms of HIV-1 reverse transcriptase structure and function.

摘要

将基因型(以氨基酸序列表示)与表型相关联的问题,因其序列数据的性质而有别于标准回归问题。在此,我们研究此类问题的一个实例,其中感兴趣的表型是HIV-1复制能力,蛋白酶和逆转录酶序列的连续片段构成基因型。在这种情况下,已经提出了多种数据分析方法。将所选技术的缺点与树结构方法的优势进行了对比。然而,树结构方法反过来也因仅具有适度的预测性能而受到批评。最近出现了一些集成方法(装袋法、提升法、随机森林法),旨在克服这一缺陷。我们评估了在此设置中应用的随机森林法,并详细说明了为何在其他情况下获得的预测增益无法实现。还应用了其他方法,包括逻辑回归、支持向量机和神经网络。我们根据HIV-1逆转录酶的结构和功能来解释结果。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验