• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

PICS-Ord:通过成对身份和代价评分排序对模糊区域进行无限制编码。

PICS-Ord: unlimited coding of ambiguous regions by pairwise identity and cost scores ordination.

机构信息

Department of Botany, The Field Museum, 1400 South Lake Shore Drive, Chicago, IL 60605-2496, USA.

出版信息

BMC Bioinformatics. 2011 Jan 7;12:10. doi: 10.1186/1471-2105-12-10.

DOI:10.1186/1471-2105-12-10
PMID:21214904
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3024941/
Abstract

BACKGROUND

We present a novel method to encode ambiguously aligned regions in fixed multiple sequence alignments by 'Pairwise Identity and Cost Scores Ordination' (PICS-Ord). The method works via ordination of sequence identity or cost scores matrices by means of Principal Coordinates Analysis (PCoA). After identification of ambiguous regions, the method computes pairwise distances as sequence identities or cost scores, ordinates the resulting distance matrix by means of PCoA, and encodes the principal coordinates as ordered integers. Three biological and 100 simulated datasets were used to assess the performance of the new method.

RESULTS

Including ambiguous regions coded by means of PICS-Ord increased topological accuracy, resolution, and bootstrap support in real biological and simulated datasets compared to the alternative of excluding such regions from the analysis a priori. In terms of accuracy, PICS-Ord performs equal to or better than previously available methods of ambiguous region coding (e.g., INAASE), with the advantage of a practically unlimited alignment size and increased analytical speed and the possibility of PICS-Ord scores to be analyzed together with DNA data in a partitioned maximum likelihood model.

CONCLUSIONS

Advantages of PICS-Ord over step matrix-based ambiguous region coding with INAASE include a practically unlimited number of OTUs and seamless integration of PICS-Ord codes into phylogenetic datasets, as well as the increased speed of phylogenetic analysis. Contrary to word- and frequency-based methods, PICS-Ord maintains the advantage of pairwise sequence alignment to derive distances, and the method is flexible with respect to the calculation of distance scores. In addition to distance and maximum parsimony, PICS-Ord codes can be analyzed in a Bayesian or maximum likelihood framework. RAxML (version 7.2.6 or higher that was developed for this study) allows up to 32-state ordered or unordered characters. A GTR, MK, or ORDERED model can be applied to analyse the PICS-Ord codes partition, with GTR performing slightly better than MK and ORDERED.

AVAILABILITY

An implementation of the PICS-Ord algorithm is available from http://scit.us/projects/ngila/wiki/PICS-Ord. It requires both the statistical software, R http://www.r-project.org and the alignment software Ngila http://scit.us/projects/ngila.

摘要

背景

我们提出了一种新的方法,通过“成对身份和成本得分排序”(PICS-Ord)对固定多重序列比对中的歧义对齐区域进行编码。该方法通过主坐标分析(PCoA)对序列同一性或成本得分矩阵进行排序。在确定歧义区域后,该方法计算序列同一性或成本得分的成对距离,通过 PCoA 对所得距离矩阵进行排序,并将主坐标编码为有序整数。使用三个生物数据集和 100 个模拟数据集来评估新方法的性能。

结果

与从分析中预先排除此类区域的替代方法相比,在真实生物和模拟数据集中,包含通过 PICS-Ord 编码的歧义区域可提高拓扑准确性、分辨率和引导支持。在准确性方面,PICS-Ord 的表现与先前可用的歧义区域编码方法(例如 INAASE)相当或更好,具有实际上无限的对齐大小、增加的分析速度以及 PICS-Ord 得分与 DNA 数据在分区最大似然模型中一起分析的可能性的优势。

结论

与基于步长矩阵的带有 INAASE 的模糊区域编码相比,PICS-Ord 的优势包括实际上无限数量的 OTU 和 PICS-Ord 代码与系统发育数据集的无缝集成,以及系统发育分析速度的提高。与基于单词和频率的方法相反,PICS-Ord 保持了从成对序列比对中推导距离的优势,并且该方法在距离得分的计算方面具有灵活性。除了距离和最大简约性之外,还可以在贝叶斯或最大似然框架中分析 PICS-Ord 代码。RAxML(为这项研究开发的版本 7.2.6 或更高版本)允许使用 32 状态有序或无序字符。可以应用 GTR、MK 或 ORDERED 模型来分析 PICS-Ord 代码分区,其中 GTR 的表现略优于 MK 和 ORDERED。

可用性

PICS-Ord 算法的实现可从 http://scit.us/projects/ngila/wiki/PICS-Ord 获得。它需要统计软件 R http://www.r-project.org 和对齐软件 Ngila http://scit.us/projects/ngila。

相似文献

1
PICS-Ord: unlimited coding of ambiguous regions by pairwise identity and cost scores ordination.PICS-Ord:通过成对身份和代价评分排序对模糊区域进行无限制编码。
BMC Bioinformatics. 2011 Jan 7;12:10. doi: 10.1186/1471-2105-12-10.
2
Ngila: global pairwise alignments with logarithmic and affine gap costs.恩吉拉:具有对数和仿射空位罚分的全局成对比对。
Bioinformatics. 2007 Jun 1;23(11):1427-8. doi: 10.1093/bioinformatics/btm095. Epub 2007 Mar 25.
3
BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments.BMGE(基于信息熵的块映射与聚集):一种从多序列比对中选择系统发育信息区域的新软件。
BMC Evol Biol. 2010 Jul 13;10:210. doi: 10.1186/1471-2148-10-210.
4
Ancestral sequence alignment under optimal conditions.在最佳条件下进行祖先序列比对。
BMC Bioinformatics. 2005 Nov 17;6:273. doi: 10.1186/1471-2105-6-273.
5
A hierarchical model for incomplete alignments in phylogenetic inference.系统发育推断中不完全比对的层次模型。
Bioinformatics. 2009 Mar 1;25(5):592-8. doi: 10.1093/bioinformatics/btp015. Epub 2009 Jan 15.
6
On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。
Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.
7
transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences.transAlign:利用氨基酸促进蛋白质编码DNA序列的多重比对。
BMC Bioinformatics. 2005 Jun 22;6:156. doi: 10.1186/1471-2105-6-156.
8
Fast NJ-like algorithms to deal with incomplete distance matrices.用于处理不完整距离矩阵的类似快速NJ的算法。
BMC Bioinformatics. 2008 Mar 26;9:166. doi: 10.1186/1471-2105-9-166.
9
Bayesian coestimation of phylogeny and sequence alignment.系统发育与序列比对的贝叶斯联合估计
BMC Bioinformatics. 2005 Apr 1;6:83. doi: 10.1186/1471-2105-6-83.
10
ProbPFP: a multiple sequence alignment algorithm combining hidden Markov model optimized by particle swarm optimization with partition function.ProbPFP:一种通过粒子群优化算法优化的隐马尔可夫模型与分区函数相结合的多序列比对算法。
BMC Bioinformatics. 2019 Nov 25;20(Suppl 18):573. doi: 10.1186/s12859-019-3132-7.

引用本文的文献

1
Zygosity-based sex determination in a butterfly drives hypervariability of .基于合子性别的蝴蝶性别决定导致了……的高度变异性。 (原文此处不完整)
Sci Adv. 2024 May 3;10(18):eadj6979. doi: 10.1126/sciadv.adj6979.
2
Formal description of sequence-based voucherless : promises and pitfalls, and how to resolve them.基于序列的无凭证形式化描述:承诺与陷阱,以及如何解决这些问题。
IMA Fungus. 2018 Jun;9(1):143-166. doi: 10.5598/imafungus.2018.09.01.09. Epub 2018 May 22.
3
Multiple Sequence Alignment Averaging Improves Phylogeny Reconstruction.多序列比对平均法提高系统发育重建。

本文引用的文献

1
New approach to an old problem: Incorporating signal from gap-rich regions of ITS and rDNA large subunit into phylogenetic analyses to resolve the Peltigera canina species complex.新方法解决老问题:将 ITS 和 rDNA 大亚基中富含间隔区的信号纳入系统发育分析,以解决 Peltigera canina 种复合体。
Mycologia. 2003 Nov-Dec;95(6):1181-203. doi: 10.1080/15572536.2004.11833027.
2
GUIDANCE: a web server for assessing alignment confidence scores.GUIDANCE:一个评估比对置信分数的网络服务器。
Nucleic Acids Res. 2010 Jul;38(Web Server issue):W23-8. doi: 10.1093/nar/gkq443. Epub 2010 May 23.
3
An alignment confidence score capturing robustness to guide tree uncertainty.
Syst Biol. 2019 Jan 1;68(1):117-130. doi: 10.1093/sysbio/syy036.
4
A comparison of the community diversity of foliar fungal endophytes between seedling and adult loblolly pines (Pinus taeda).火炬松幼苗和成年植株叶片真菌内生菌群落多样性的比较
Fungal Biol. 2015 Oct;119(10):917-928. doi: 10.1016/j.funbio.2015.07.003. Epub 2015 Jul 17.
一种对齐置信度评分,可捕捉对引导树不确定性的稳健性。
Mol Biol Evol. 2010 Aug;27(8):1759-67. doi: 10.1093/molbev/msq066. Epub 2010 Mar 5.
4
Phylogenetic inference under varying proportions of indel-induced alignment gaps.在不同比例的插入缺失导致的比对空位情况下的系统发育推断。
BMC Evol Biol. 2009 Aug 23;9:211. doi: 10.1186/1471-2148-9-211.
5
High concentration of basidiolichens in a single family of agaricoid mushrooms (Basidiomycota: Agaricales: Hygrophoraceae).在一个伞菌状蘑菇家族(担子菌门:伞菌目:蜡伞科)中,高浓度的担子地衣。
Mycol Res. 2009 Oct;113(Pt 10):1154-71. doi: 10.1016/j.mycres.2009.07.016. Epub 2009 Jul 29.
6
Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees.序列比对和系统发育树的快速准确大规模联合估计
Science. 2009 Jun 19;324(5934):1561-4. doi: 10.1126/science.1171243.
7
INDELible: a flexible simulator of biological sequence evolution.INDELible:一款灵活的生物序列进化模拟器。
Mol Biol Evol. 2009 Aug;26(8):1879-88. doi: 10.1093/molbev/msp098. Epub 2009 May 7.
8
Multiple alignment of DNA sequences with MAFFT.使用MAFFT对DNA序列进行多重比对。
Methods Mol Biol. 2009;537:39-64. doi: 10.1007/978-1-59745-251-9_3.
9
Problems and solutions for estimating indel rates and length distributions.估计插入缺失率和长度分布的问题与解决方案。
Mol Biol Evol. 2009 Feb;26(2):473-80. doi: 10.1093/molbev/msn275. Epub 2008 Nov 28.
10
A rapid bootstrap algorithm for the RAxML Web servers.一种用于RAxML网络服务器的快速自引导算法。
Syst Biol. 2008 Oct;57(5):758-71. doi: 10.1080/10635150802429642.