• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

DECIPHER:利用局部序列上下文来改进蛋白质多序列比对。

DECIPHER: harnessing local sequence context to improve protein multiple sequence alignment.

作者信息

Wright Erik S

机构信息

Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, 53715, USA.

Wisconsin Institute for Discovery, University of Wisconsin-Madison, 330 N. Orchard St., Madison, WI, 53715, USA.

出版信息

BMC Bioinformatics. 2015 Oct 6;16:322. doi: 10.1186/s12859-015-0749-z.

DOI:10.1186/s12859-015-0749-z
PMID:26445311
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4595117/
Abstract

BACKGROUND

Alignment of large and diverse sequence sets is a common task in biological investigations, yet there remains considerable room for improvement in alignment quality. Multiple sequence alignment programs tend to reach maximal accuracy when aligning only a few sequences, and then diminish steadily as more sequences are added. This drop in accuracy can be partly attributed to a build-up of error and ambiguity as more sequences are aligned. Most high-throughput sequence alignment algorithms do not use contextual information under the assumption that sites are independent. This study examines the extent to which local sequence context can be exploited to improve the quality of large multiple sequence alignments.

RESULTS

Two predictors based on local sequence context were assessed: (i) single sequence secondary structure predictions, and (ii) modulation of gap costs according to the surrounding residues. The results indicate that context-based predictors have appreciable information content that can be utilized to create more accurate alignments. Furthermore, local context becomes more informative as the number of sequences increases, enabling more accurate protein alignments of large empirical benchmarks. These discoveries became the basis for DECIPHER, a new context-aware program for sequence alignment, which outperformed other programs on large sequence sets.

CONCLUSIONS

Predicting secondary structure based on local sequence context is an efficient means of breaking the independence assumption in alignment. Since secondary structure is more conserved than primary sequence, it can be leveraged to improve the alignment of distantly related proteins. Moreover, secondary structure predictions increase in accuracy as more sequences are used in the prediction. This enables the scalable generation of large sequence alignments that maintain high accuracy even on diverse sequence sets. The DECIPHER R package and source code are freely available for download at DECIPHER.cee.wisc.edu and from the Bioconductor repository.

摘要

背景

在生物学研究中,对大量不同序列集进行比对是一项常见任务,但比对质量仍有很大的提升空间。多个序列比对程序在仅比对少数序列时往往能达到最大准确度,而随着序列数量增加,准确度会稳步下降。这种准确度下降部分可归因于随着比对序列增多,错误和模糊性逐渐累积。大多数高通量序列比对算法在假设位点相互独立的情况下未使用上下文信息。本研究考察了利用局部序列上下文来提高大型多序列比对质量的程度。

结果

评估了基于局部序列上下文的两种预测器:(i)单序列二级结构预测,以及(ii)根据周围残基调整空位罚分。结果表明基于上下文的预测器具有可观的信息含量,可用于创建更准确的比对。此外,随着序列数量增加,局部上下文的信息量更大,能够对大型实证基准进行更准确的蛋白质比对。这些发现成为DECIPHER的基础,DECIPHER是一种新的用于序列比对的上下文感知程序,在大型序列集上的表现优于其他程序。

结论

基于局部序列上下文预测二级结构是打破比对中独立性假设的有效手段。由于二级结构比一级序列更保守,可利用它来改善远缘相关蛋白质的比对。此外,随着用于预测的序列增多,二级结构预测准确度提高。这使得能够可扩展地生成大型序列比对,即使在多样的序列集上也能保持高精度。DECIPHER R包和源代码可在DECIPHER.cee.wisc.edu以及Bioconductor仓库免费下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c402/4595117/41434905c50c/12859_2015_749_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c402/4595117/77d7be7b72eb/12859_2015_749_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c402/4595117/89db6bda82a8/12859_2015_749_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c402/4595117/f3f93a85459e/12859_2015_749_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c402/4595117/5e5fad5d720c/12859_2015_749_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c402/4595117/31ee9cfb6701/12859_2015_749_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c402/4595117/41434905c50c/12859_2015_749_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c402/4595117/77d7be7b72eb/12859_2015_749_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c402/4595117/89db6bda82a8/12859_2015_749_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c402/4595117/f3f93a85459e/12859_2015_749_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c402/4595117/5e5fad5d720c/12859_2015_749_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c402/4595117/31ee9cfb6701/12859_2015_749_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c402/4595117/41434905c50c/12859_2015_749_Fig6_HTML.jpg

相似文献

1
DECIPHER: harnessing local sequence context to improve protein multiple sequence alignment.DECIPHER:利用局部序列上下文来改进蛋白质多序列比对。
BMC Bioinformatics. 2015 Oct 6;16:322. doi: 10.1186/s12859-015-0749-z.
2
PROMALS: towards accurate multiple sequence alignments of distantly related proteins.PROMALS:用于实现远缘相关蛋白质准确多序列比对
Bioinformatics. 2007 Apr 1;23(7):802-8. doi: 10.1093/bioinformatics/btm017. Epub 2007 Jan 31.
3
Protein multiple sequence alignment benchmarking through secondary structure prediction.通过二级结构预测进行蛋白质多序列比对基准测试。
Bioinformatics. 2017 May 1;33(9):1331-1337. doi: 10.1093/bioinformatics/btw840.
4
Accuracy of structure-based sequence alignment of automatic methods.自动方法的基于结构的序列比对准确性。
BMC Bioinformatics. 2007 Sep 20;8:355. doi: 10.1186/1471-2105-8-355.
5
transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences.transAlign:利用氨基酸促进蛋白质编码DNA序列的多重比对。
BMC Bioinformatics. 2005 Jun 22;6:156. doi: 10.1186/1471-2105-6-156.
6
PROMALS web server for accurate multiple protein sequence alignments.用于精确多蛋白序列比对的PROMALS网络服务器。
Nucleic Acids Res. 2007 Jul;35(Web Server issue):W649-52. doi: 10.1093/nar/gkm227. Epub 2007 Apr 22.
7
Multiple sequence alignment based on profile alignment of intermediate sequences.基于中间序列的轮廓比对进行多序列比对。
J Comput Biol. 2008 Sep;15(7):767-77. doi: 10.1089/cmb.2007.0132.
8
Application of protein structure alignments to iterated hidden Markov model protocols for structure prediction.蛋白质结构比对在用于结构预测的迭代隐马尔可夫模型协议中的应用。
BMC Bioinformatics. 2006 Sep 14;7:410. doi: 10.1186/1471-2105-7-410.
9
Clustal Omega for making accurate alignments of many protein sequences.Clustal Omega用于对多个蛋白质序列进行精确比对。
Protein Sci. 2018 Jan;27(1):135-145. doi: 10.1002/pro.3290. Epub 2017 Oct 30.
10
APDB: a web server to evaluate the accuracy of sequence alignments using structural information.APDB:一个利用结构信息评估序列比对准确性的网络服务器。
Bioinformatics. 2006 Oct 1;22(19):2439-40. doi: 10.1093/bioinformatics/btl404.

引用本文的文献

1
Sequence-based prioritization of i-Motif candidates in the human genome.基于序列的人类基因组中i-基序候选序列的优先级排序。
Front Bioinform. 2025 Aug 12;5:1657841. doi: 10.3389/fbinf.2025.1657841. eCollection 2025.
2
A Low Dose of Berberine Is Metabolized in Weaned Piglets Without Major Changes to Gut Morphology or Gut Microbiota.低剂量小檗碱在断奶仔猪体内代谢,对肠道形态或肠道微生物群无重大影响。
Animals (Basel). 2025 Aug 21;15(16):2450. doi: 10.3390/ani15162450.
3
Microbiota-derived corisin accelerates kidney fibrosis by promoting cellular aging.

本文引用的文献

1
Simple chained guide trees give poorer multiple sequence alignments than inferred trees in simulation and phylogenetic benchmarks.在模拟和系统发育基准测试中,简单的链式引导树给出的多序列比对结果比推断树的结果更差。
Proc Natl Acad Sci U S A. 2015 Jan 13;112(2):E99-100. doi: 10.1073/pnas.1417526112. Epub 2015 Jan 6.
2
Reply to Tan et al.: Differences between real and simulated proteins in multiple sequence alignments.对Tan等人的回复:多序列比对中真实蛋白质与模拟蛋白质之间的差异。
Proc Natl Acad Sci U S A. 2015 Jan 13;112(2):E101. doi: 10.1073/pnas.1419351112. Epub 2015 Jan 6.
3
PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences.
微生物群衍生的corisin通过促进细胞衰老加速肾纤维化。
Nat Commun. 2025 Aug 25;16(1):7591. doi: 10.1038/s41467-025-61847-2.
4
Catch me if you can: viral nucleic acids to host sensors.若你能,就抓住我:病毒核酸与宿主传感器。
Front Immunol. 2025 Jul 28;16:1632283. doi: 10.3389/fimmu.2025.1632283. eCollection 2025.
5
Evaluation of ATCC PTA-122264 on the fecal characteristics and microbiota of healthy adult dogs subjected to an abrupt diet change.评估ATCC PTA - 122264对突然改变饮食的健康成年犬粪便特征和微生物群的影响。
Front Vet Sci. 2025 Jul 17;12:1617072. doi: 10.3389/fvets.2025.1617072. eCollection 2025.
6
Sliding Window Interaction Grammar (SWING): a generalized interaction language model for peptide and protein interactions.滑动窗口相互作用语法(SWING):一种用于肽和蛋白质相互作用的广义相互作用语言模型。
Nat Methods. 2025 Jul 28. doi: 10.1038/s41592-025-02723-1.
7
The fecal metabolome and microbiome are altered in dogs with idiopathic epilepsy compared to healthy dogs.与健康犬相比,特发性癫痫犬的粪便代谢组和微生物群发生了改变。
Sci Rep. 2025 Jul 25;15(1):27024. doi: 10.1038/s41598-025-09919-7.
8
Host origin is a determinant of coevolution between gene segments of avian H9 influenza viruses.宿主来源是禽H9流感病毒基因片段之间协同进化的一个决定因素。
J Virol. 2025 Jul 22;99(7):e0151824. doi: 10.1128/jvi.01518-24. Epub 2025 Jun 13.
9
A shortcut to sample coverage standardization in metabarcoding data provides new insights into land-use effects on insect diversity.代谢条码数据中样本覆盖度标准化的捷径为土地利用对昆虫多样性的影响提供了新见解。
Proc Biol Sci. 2025 May;292(2046):20242927. doi: 10.1098/rspb.2024.2927. Epub 2025 May 7.
10
More Than a Stick in the Mud: Eelgrass Leaf and Root Bacterial Communities Are Distinct From Those on Physical Mimics.不仅仅是一潭死水中的植物:鳗草的叶片和根系细菌群落与物理模拟物上的细菌群落不同。
Environ Microbiol Rep. 2025 Jun;17(3):e70086. doi: 10.1111/1758-2229.70086.
PASTA:用于核苷酸和氨基酸序列的超大多重序列比对
J Comput Biol. 2015 May;22(5):377-86. doi: 10.1089/cmb.2014.0156. Epub 2014 Dec 30.
4
Mathematical tools to optimize the design of oligonucleotide probes and primers.用于优化寡核苷酸探针和引物设计的数学工具。
Appl Microbiol Biotechnol. 2014 Dec;98(23):9595-608. doi: 10.1007/s00253-014-6165-x. Epub 2014 Oct 31.
5
Bayesian model of protein primary sequence for secondary structure prediction.用于二级结构预测的蛋白质一级序列的贝叶斯模型。
PLoS One. 2014 Oct 14;9(10):e109832. doi: 10.1371/journal.pone.0109832. eCollection 2014.
6
A formal perturbation equation between genotype and phenotype determines the Evolutionary Action of protein-coding variations on fitness.基因型与表型之间的一个形式化微扰方程决定了蛋白质编码变异对适合度的进化作用。
Genome Res. 2014 Dec;24(12):2050-8. doi: 10.1101/gr.176214.114. Epub 2014 Sep 12.
7
Simple chained guide trees give high-quality protein multiple sequence alignments.简单的链式引导树可生成高质量的蛋白质多重序列比对。
Proc Natl Acad Sci U S A. 2014 Jul 22;111(29):10556-61. doi: 10.1073/pnas.1405628111. Epub 2014 Jul 7.
8
Automated design of probes for rRNA-targeted fluorescence in situ hybridization reveals the advantages of using dual probes for accurate identification.用于rRNA靶向荧光原位杂交的探针自动化设计揭示了使用双探针进行准确鉴定的优势。
Appl Environ Microbiol. 2014 Aug;80(16):5124-33. doi: 10.1128/AEM.01685-14. Epub 2014 Jun 13.
9
Exploiting extension bias in polymerase chain reaction to improve primer specificity in ensembles of nearly identical DNA templates.利用聚合酶链反应中的延伸偏倚提高近同源 DNA 模板混合物中引物的特异性。
Environ Microbiol. 2014 May;16(5):1354-65. doi: 10.1111/1462-2920.12259. Epub 2013 Sep 24.
10
Pfam: the protein families database.Pfam:蛋白质家族数据库。
Nucleic Acids Res. 2014 Jan;42(Database issue):D222-30. doi: 10.1093/nar/gkt1223. Epub 2013 Nov 27.