• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于遗传序列加权的系统发育方法。

A phylogenetic approach for weighting genetic sequences.

机构信息

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK.

Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC, USA.

出版信息

BMC Bioinformatics. 2021 May 28;22(1):285. doi: 10.1186/s12859-021-04183-8.

DOI:10.1186/s12859-021-04183-8
PMID:34049487
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8164272/
Abstract

BACKGROUND

Many important applications in bioinformatics, including sequence alignment and protein family profiling, employ sequence weighting schemes to mitigate the effects of non-independence of homologous sequences and under- or over-representation of certain taxa in a dataset. These schemes aim to assign high weights to sequences that are 'novel' compared to the others in the same dataset, and low weights to sequences that are over-represented.

RESULTS

We formalise this principle by rigorously defining the evolutionary 'novelty' of a sequence within an alignment. This results in new sequence weights that we call 'phylogenetic novelty scores'. These scores have various desirable properties, and we showcase their use by considering, as an example application, the inference of character frequencies at an alignment column-important, for example, in protein family profiling. We give computationally efficient algorithms for calculating our scores and, using simulations, show that they are versatile and can improve the accuracy of character frequency estimation compared to existing sequence weighting schemes.

CONCLUSIONS

Our phylogenetic novelty scores can be useful when an evolutionarily meaningful system for adjusting for uneven taxon sampling is desired. They have numerous possible applications, including estimation of evolutionary conservation scores and sequence logos, identification of targets in conservation biology, and improving and measuring sequence alignment accuracy.

摘要

背景

生物信息学中的许多重要应用,包括序列比对和蛋白质家族分析,都采用序列加权方案来减轻同源序列的非独立性以及数据集中某些分类单元的过表达或欠表达的影响。这些方案旨在为与同一数据集中的其他序列相比具有“新颖性”的序列分配高权重,而为过表达的序列分配低权重。

结果

我们通过严格定义比对中序列的进化“新颖性”来形式化这一原则。这导致了我们称之为“系统发育新颖性得分”的新序列权重。这些得分具有各种理想的特性,我们通过考虑一个示例应用来展示它们的用途,例如在对齐列字符频率推断中的应用,这对于蛋白质家族分析非常重要。我们给出了计算我们得分的计算效率算法,并通过模拟表明,与现有序列加权方案相比,它们具有多功能性,可以提高字符频率估计的准确性。

结论

当需要一个具有进化意义的系统来调整不均匀的分类单元采样时,我们的系统发育新颖性得分可能非常有用。它们有许多可能的应用,包括估计进化保守性得分和序列标志,识别保护生物学中的目标,以及改进和衡量序列比对的准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf16/8164272/40db5819ee6e/12859_2021_4183_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf16/8164272/e50bb054542d/12859_2021_4183_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf16/8164272/a734188fff0d/12859_2021_4183_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf16/8164272/0bb2f121c752/12859_2021_4183_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf16/8164272/45c30e957cdf/12859_2021_4183_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf16/8164272/1823bbdd8aac/12859_2021_4183_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf16/8164272/40db5819ee6e/12859_2021_4183_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf16/8164272/e50bb054542d/12859_2021_4183_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf16/8164272/a734188fff0d/12859_2021_4183_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf16/8164272/0bb2f121c752/12859_2021_4183_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf16/8164272/45c30e957cdf/12859_2021_4183_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf16/8164272/1823bbdd8aac/12859_2021_4183_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf16/8164272/40db5819ee6e/12859_2021_4183_Fig6_HTML.jpg

相似文献

1
A phylogenetic approach for weighting genetic sequences.一种用于遗传序列加权的系统发育方法。
BMC Bioinformatics. 2021 May 28;22(1):285. doi: 10.1186/s12859-021-04183-8.
2
Bayesian coestimation of phylogeny and sequence alignment.系统发育与序列比对的贝叶斯联合估计
BMC Bioinformatics. 2005 Apr 1;6:83. doi: 10.1186/1471-2105-6-83.
3
SEPP: SATé-enabled phylogenetic placement.SEPP:基于SATé的系统发育定位
Pac Symp Biocomput. 2012:247-58. doi: 10.1142/9789814366496_0024.
4
Statistically consistent and computationally efficient inference of ancestral DNA sequences in the TKF91 model under dense taxon sampling.在密集分类采样下,使用 TKF91 模型对祖先 DNA 序列进行统计一致且计算高效的推断。
Bull Math Biol. 2020 Jan 22;82(2):21. doi: 10.1007/s11538-020-00693-3.
5
transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences.transAlign:利用氨基酸促进蛋白质编码DNA序列的多重比对。
BMC Bioinformatics. 2005 Jun 22;6:156. doi: 10.1186/1471-2105-6-156.
6
PhyloGibbs: a Gibbs sampling motif finder that incorporates phylogeny.PhyloGibbs:一种整合了系统发育的吉布斯采样基序查找器。
PLoS Comput Biol. 2005 Dec;1(7):e67. doi: 10.1371/journal.pcbi.0010067. Epub 2005 Dec 9.
7
A configuration space of homologous proteins conserving mutual information and allowing a phylogeny inference based on pair-wise Z-score probabilities.同源蛋白质的一种构象空间,其保留互信息并允许基于成对Z分数概率进行系统发育推断。
BMC Bioinformatics. 2005 Mar 10;6:49. doi: 10.1186/1471-2105-6-49.
8
[Foundations of the new phylogenetics].[新系统发育学的基础]
Zh Obshch Biol. 2004 Jul-Aug;65(4):334-66.
9
Phylogenetic Tree Estimation With and Without Alignment: New Distance Methods and Benchmarking.有比对和无比对情况下的系统发育树估计:新的距离方法与基准测试
Syst Biol. 2017 Mar 1;66(2):218-231. doi: 10.1093/sysbio/syw074.
10
PCV: An Alignment Free Method for Finding Homologous Nucleotide Sequences and its Application in Phylogenetic Study.PCV:一种用于寻找同源核苷酸序列的无比对方法及其在系统发育研究中的应用。
Interdiscip Sci. 2017 Jun;9(2):173-183. doi: 10.1007/s12539-015-0136-5. Epub 2016 Jan 29.

引用本文的文献

1
NetAllergen, a random forest model integrating MHC-II presentation propensity for improved allergenicity prediction.NetAllergen,一种整合了MHC-II呈递倾向以改进变应原性预测的随机森林模型。
Bioinform Adv. 2023 Oct 16;3(1):vbad151. doi: 10.1093/bioadv/vbad151. eCollection 2023.

本文引用的文献

1
SciPy 1.0: fundamental algorithms for scientific computing in Python.SciPy 1.0:Python 中的科学计算基础算法。
Nat Methods. 2020 Mar;17(3):261-272. doi: 10.1038/s41592-019-0686-2. Epub 2020 Feb 3.
2
Revisiting a Key Innovation in Evolutionary Biology: Felsenstein's "Phylogenies and the Comparative Method".重温进化生物学的一个关键创新:费雪斯坦的“系统发育与比较方法”。
Am Nat. 2019 Jun;193(6):755-772. doi: 10.1086/703055. Epub 2019 Apr 23.
3
Phylogenetic effective sample size.系统发育有效样本量
J Theor Biol. 2016 Oct 21;407:371-386. doi: 10.1016/j.jtbi.2016.06.026. Epub 2016 Jun 21.
4
The Pfam protein families database: towards a more sustainable future.Pfam蛋白质家族数据库:迈向更可持续的未来。
Nucleic Acids Res. 2016 Jan 4;44(D1):D279-85. doi: 10.1093/nar/gkv1344. Epub 2015 Dec 15.
5
Trends in substitution models of molecular evolution.分子进化替代模型的趋势。
Front Genet. 2015 Oct 26;6:319. doi: 10.3389/fgene.2015.00319. eCollection 2015.
6
HMMER web server: 2015 update.HMMER网络服务器:2015年更新版。
Nucleic Acids Res. 2015 Jul 1;43(W1):W30-8. doi: 10.1093/nar/gkv397. Epub 2015 May 5.
7
A penalized-likelihood method to estimate the distribution of selection coefficients from phylogenetic data.一种从系统发育数据估计选择系数分布的惩罚似然方法。
Genetics. 2014 May;197(1):257-71. doi: 10.1534/genetics.114.162263. Epub 2014 Feb 14.
8
Integrating influenza antigenic dynamics with molecular evolution.整合流感病毒抗原动态变化与分子进化
Elife. 2014;3:e01914. doi: 10.7554/eLife.01914. Epub 2014 Feb 4.
9
jModelTest 2: more models, new heuristics and parallel computing.jModelTest 2:更多模型、新启发式方法与并行计算。
Nat Methods. 2012 Jul 30;9(8):772. doi: 10.1038/nmeth.2109.
10
Molecular phylogenetics: principles and practice.分子系统发育学:原理与实践。
Nat Rev Genet. 2012 Mar 28;13(5):303-14. doi: 10.1038/nrg3186.