• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
The relative inefficiency of sequence weights approaches in determining a nucleotide position weight matrix.序列权重方法在确定核苷酸位置权重矩阵方面相对低效。
Stat Appl Genet Mol Biol. 2005;4:Article13. doi: 10.2202/1544-6115.1135. Epub 2005 Jun 1.
2
MAXIMUM-LIKELIHOOD ESTIMATES OF SELECTION COEFFICIENTS FROM DNA SEQUENCE DATA.基于DNA序列数据的选择系数的最大似然估计
Evolution. 1993 Oct;47(5):1420-1431. doi: 10.1111/j.1558-5646.1993.tb02164.x.
3
Subclassification estimation of the weighted average treatment effect.加权平均处理效应的细分估计。
Biom J. 2021 Dec;63(8):1706-1728. doi: 10.1002/bimj.202000310. Epub 2021 Jul 16.
4
Shrinkage estimators for covariance matrices.协方差矩阵的收缩估计量。
Biometrics. 2001 Dec;57(4):1173-84. doi: 10.1111/j.0006-341x.2001.01173.x.
5
Relative efficiencies of the maximum-likelihood, neighbor-joining, and maximum-parsimony methods when substitution rate varies with site.当替换率随位点变化时,最大似然法、邻接法和最大简约法的相对效率。
Mol Biol Evol. 1994 Mar;11(2):261-77. doi: 10.1093/oxfordjournals.molbev.a040108.
6
HANDLING MISSING DATA BY DELETING COMPLETELY OBSERVED RECORDS.通过删除完全观测记录来处理缺失数据。
J Stat Plan Inference. 2009 Jul 1;139(7):2341-2350. doi: 10.1016/j.jspi.2008.10.024.
7
Overall mean estimation of trace evidence in a two-level normal-normal model.两级正态-正态模型中微量物证的总体均值估计
Forensic Sci Int. 2019 Apr;297:342-349. doi: 10.1016/j.forsciint.2019.01.047. Epub 2019 Feb 10.
8
Phylogenetic placement of metagenomic reads using the minimum evolution principle.基于最小进化原理对宏基因组 reads 进行系统发育定位。
BMC Genomics. 2015;16 Suppl 1(Suppl 1):S13. doi: 10.1186/1471-2164-16-S1-S13. Epub 2015 Jan 15.
9
Performance comparison between k-tuple distance and four model-based distances in phylogenetic tree reconstruction.在系统发育树重建中,k元组距离与四种基于模型的距离之间的性能比较。
Nucleic Acids Res. 2008 Mar;36(5):e33. doi: 10.1093/nar/gkn075. Epub 2008 Feb 22.
10
A comparison via simulation of least squares Lehmann-Scheffé estimators of two variances and heritability with those of restricted maximum likelihood.通过模拟对两个方差和遗传力的最小二乘Lehmann-Scheffé估计量与限制最大似然估计量进行比较。
J Anim Sci. 2003 Aug;81(8):1950-8. doi: 10.2527/2003.8181950x.

引用本文的文献

1
Phylogenetic weighting does little to improve the accuracy of evolutionary coupling analyses.系统发育加权对提高进化偶联分析的准确性作用不大。
Entropy (Basel). 2019 Oct;21(10). doi: 10.3390/e21101000. Epub 2019 Oct 12.
2
Constructing a meaningful evolutionary average at the phylogenetic center of mass.在系统发育质心构建有意义的进化平均值。
BMC Bioinformatics. 2007 Jun 26;8:222. doi: 10.1186/1471-2105-8-222.
3
A phylogenetic Gibbs sampler that yields centroid solutions for cis-regulatory site prediction.一种用于顺式调控位点预测并产生质心解的系统发生吉布斯采样器。
Bioinformatics. 2007 Jul 15;23(14):1718-27. doi: 10.1093/bioinformatics/btm241. Epub 2007 May 8.

本文引用的文献

1
Phylogenetic shadowing of primate sequences to find functional regions of the human genome.通过对灵长类序列进行系统发育影子分析来寻找人类基因组的功能区域。
Science. 2003 Feb 28;299(5611):1391-4. doi: 10.1126/science.1081331.
2
Additivity in protein-DNA interactions: how good an approximation is it?蛋白质与DNA相互作用中的加性:它的近似程度如何?
Nucleic Acids Res. 2002 Oct 15;30(20):4442-51. doi: 10.1093/nar/gkf578.
3
Factors influencing the identification of transcription factor binding sites by cross-species comparison.跨物种比较影响转录因子结合位点识别的因素。
Genome Res. 2002 Oct;12(10):1523-32. doi: 10.1101/gr.323602.
4
The evolution of DNA regulatory regions for proteo-gamma bacteria by interspecies comparisons.通过种间比较研究蛋白γ-变形菌DNA调控区域的进化
Genome Res. 2002 Feb;12(2):298-308. doi: 10.1101/gr.207502.
5
Optimal classification of protein sequences and selection of representative sets from multiple alignments: application to homologous families and lessons for structural genomics.蛋白质序列的最佳分类及从多序列比对中选择代表性序列集:在同源家族中的应用及对结构基因组学的启示
Protein Eng. 2001 Apr;14(4):209-17. doi: 10.1093/protein/14.4.209.
6
Molecular phylogeny of Old World monkeys (Cercopithecidae) as inferred from gamma-globin DNA sequences.基于γ-珠蛋白DNA序列推断的旧大陆猴(猕猴科)分子系统发育
Mol Phylogenet Evol. 1999 Nov;13(2):348-59. doi: 10.1006/mpev.1999.0653.
7
Estimation of reversible substitution matrices from multiple pairs of sequences.从多对序列估计可逆替换矩阵。
J Mol Evol. 1997 Dec;45(6):696-703. doi: 10.1007/pl00006274.
8
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.空位BLAST和位置特异性迭代BLAST:新一代蛋白质数据库搜索程序。
Nucleic Acids Res. 1997 Sep 1;25(17):3389-402. doi: 10.1093/nar/25.17.3389.
9
Modeling residue usage in aligned protein sequences via maximum likelihood.通过最大似然法对比对后的蛋白质序列中的残基使用情况进行建模。
Mol Biol Evol. 1996 Dec;13(10):1368-74. doi: 10.1093/oxfordjournals.molbev.a025583.
10
Weighting in sequence space: a comparison of methods in terms of generalized sequences.序列空间中的加权:基于广义序列的方法比较。
Proc Natl Acad Sci U S A. 1993 Oct 1;90(19):8777-81. doi: 10.1073/pnas.90.19.8777.

序列权重方法在确定核苷酸位置权重矩阵方面相对低效。

The relative inefficiency of sequence weights approaches in determining a nucleotide position weight matrix.

作者信息

Newberg Lee A, McCue Lee Ann, Lawrence Charles E

机构信息

NYSDOH Wadsworth Center & Rensselaer Polytechnic Institute Department of Computer Science.

出版信息

Stat Appl Genet Mol Biol. 2005;4:Article13. doi: 10.2202/1544-6115.1135. Epub 2005 Jun 1.

DOI:10.2202/1544-6115.1135
PMID:16646830
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1479456/
Abstract

Approaches based upon sequence weights, to construct a position weight matrix of nucleotides from aligned inputs, are popular but little effort has been expended to measure their quality. We derive optimal sequence weights that minimize the sum of the variances of the estimators of base frequency parameters for sequences related by a phylogenetic tree. Using these we find that approaches based upon sequence weights can perform very poorly in comparison to approaches based upon a theoretically optimal maximum-likelihood method in the inference of the parameters of a position-weight matrix. Specifically, we find that among a collection of primate sequences, even an optimal sequences-weights approach is only 51% as efficient as the maximum-likelihood approach in inferences of base frequency parameters. We also show how to employ the variance estimators to obtain a greedy ordering of species for sequencing. Application of this ordering for the weighted estimators to a primate collection yields a curve with a long plateau that is not observed with maximum-likelihood estimators. This plateau indicates that the use of weighted estimators on these data seriously limits the utility of obtaining the sequences of more than two or three additional species.

摘要

基于序列权重从比对后的输入构建核苷酸位置权重矩阵的方法很流行,但在衡量其质量方面却很少有人付出努力。我们推导了最优序列权重,以最小化由系统发育树相关的序列的碱基频率参数估计值的方差之和。利用这些权重,我们发现,在推断位置权重矩阵的参数时,与基于理论上最优的最大似然方法相比,基于序列权重的方法可能表现得非常差。具体而言,我们发现在一组灵长类序列中,即使是最优的序列权重方法,在推断碱基频率参数时的效率也仅为最大似然方法的51%。我们还展示了如何使用方差估计值来获得用于测序的物种的贪婪排序。将这种排序应用于加权估计值到一组灵长类序列中,会产生一条有很长平稳期的曲线,而最大似然估计值则不会出现这种情况。这个平稳期表明,在这些数据上使用加权估计值严重限制了获取两到三个以上额外物种序列的效用。