• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用p值合并证据:在序列同源性搜索中的应用。

Combining evidence using p-values: application to sequence homology searches.

作者信息

Bailey T L, Gribskov M

机构信息

San Diego Supercomputer Center, CA 92186-9784, USA.

出版信息

Bioinformatics. 1998;14(1):48-54. doi: 10.1093/bioinformatics/14.1.48.

DOI:10.1093/bioinformatics/14.1.48
PMID:9520501
Abstract

MOTIVATION

To illustrate an intuitive and statistically valid method for combining independent sources of evidence that yields a p-value for the complete evidence, and to apply it to the problem of detecting simultaneous matches to multiple patterns in sequence homology searches.

RESULTS

In sequence analysis, two or more (approximately) independent measures of the membership of a sequence (or sequence region) in some class are often available. We would like to estimate the likelihood of the sequence being a member of the class in view of all the available evidence. An example is estimating the significance of the observed match of a macromolecular sequence (DNA or protein) to a set of patterns (motifs) that characterize a biological sequence family. An intuitive way to do this is to express each piece of evidence as a p-value, and then use the product of these p-values as the measure of membership in the family. We derive a formula and algorithm (QFAST) for calculating the statistical distribution of the product of n independent p-values. We demonstrate that sorting sequences by this p-value effectively combines the information present in multiple motifs, leading to highly accurate and sensitive sequence homology searches.

摘要

动机

阐述一种直观且统计有效的方法,用于合并独立的证据来源,从而得出完整证据的p值,并将其应用于序列同源性搜索中检测与多个模式同时匹配的问题。

结果

在序列分析中,通常可以获得关于某个序列(或序列区域)属于某类的两个或更多(近似)独立度量。鉴于所有可用证据,我们希望估计该序列属于该类的可能性。一个例子是估计观察到的大分子序列(DNA或蛋白质)与一组表征生物序列家族的模式(基序)匹配的显著性。一种直观的方法是将每条证据表示为一个p值,然后使用这些p值的乘积作为属于该家族的度量。我们推导了一个公式和算法(QFAST)来计算n个独立p值乘积的统计分布。我们证明,按此p值对序列进行排序可有效合并多个基序中存在的信息,从而实现高度准确和灵敏的序列同源性搜索。

相似文献

1
Combining evidence using p-values: application to sequence homology searches.使用p值合并证据:在序列同源性搜索中的应用。
Bioinformatics. 1998;14(1):48-54. doi: 10.1093/bioinformatics/14.1.48.
2
A test for the statistical significance of DNA sequence similarities for application in databank searches.一种用于数据库搜索中DNA序列相似性统计显著性的检验。
Comput Appl Biosci. 1989 Apr;5(2):123-31. doi: 10.1093/bioinformatics/5.2.123.
3
Methods and statistics for combining motif match scores.用于合并基序匹配分数的方法和统计
J Comput Biol. 1998 Summer;5(2):211-21. doi: 10.1089/cmb.1998.5.211.
4
Matching among multiple random sequences.多个随机序列之间的匹配。
Bull Math Biol. 1997 May;59(3):483-96. doi: 10.1007/BF02459461.
5
Score distributions for simultaneous matching to multiple motifs.同时匹配多个基序的分数分布。
J Comput Biol. 1997 Spring;4(1):45-59. doi: 10.1089/cmb.1997.4.45.
6
Estimating statistical significance of sequence alignments.估计序列比对的统计学显著性。
Philos Trans R Soc Lond B Biol Sci. 1994 Jun 29;344(1310):383-90. doi: 10.1098/rstb.1994.0077.
7
PARALIGN: rapid and sensitive sequence similarity searches powered by parallel computing technology.PARALIGN:由并行计算技术驱动的快速且灵敏的序列相似性搜索。
Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W535-9. doi: 10.1093/nar/gki423.
8
Characterizing the D2 statistic: word matches in biological sequences.表征D2统计量:生物序列中的单词匹配
Stat Appl Genet Mol Biol. 2009;8:Article 43. doi: 10.2202/1544-6115.1447. Epub 2009 Oct 8.
9
On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。
Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.
10
Tests for the statistical significance of protein sequence similarities in data-bank searches.数据库搜索中蛋白质序列相似性的统计学显著性检验。
Protein Eng. 1990 Dec;4(2):149-54. doi: 10.1093/protein/4.2.149.

引用本文的文献

1
Analysis and comparison of the bacterial σ54 regulon: Evidence of phylogenetic trends in gene regulation.细菌σ54调控子的分析与比较:基因调控中系统发育趋势的证据
PLoS One. 2025 Aug 1;20(8):e0327805. doi: 10.1371/journal.pone.0327805. eCollection 2025.
2
The Transcription Factor CaNAC81 Is Involved in the Carotenoid Accumulation in Chili Pepper Fruits.转录因子CaNAC81参与辣椒果实中类胡萝卜素的积累。
Plants (Basel). 2025 Jul 8;14(14):2099. doi: 10.3390/plants14142099.
3
Jpx RNA controls Xist induction through spatial reorganization of the mouse X-inactivation center.
Jpx RNA通过小鼠X染色体失活中心的空间重组来控制Xist的诱导。
Dev Cell. 2025 Jul 11. doi: 10.1016/j.devcel.2025.06.028.
4
Comparative Genomic Analysis of COMT Family Genes in Three Species Reveals Evolutionary Relationships and Functional Divergence.三种物种中COMT家族基因的比较基因组分析揭示了进化关系和功能分歧。
Plants (Basel). 2025 Jul 7;14(13):2079. doi: 10.3390/plants14132079.
5
Global analysis of the Hfq-mediated RNA interactome discovers a MicA homolog that affects the cytotoxicity, biofilm formation, and resistance to complement of Bordetella pertussis.对Hfq介导的RNA相互作用组的全局分析发现了一种MicA同源物,它影响百日咳博德特氏菌的细胞毒性、生物膜形成和对补体的抗性。
Nucleic Acids Res. 2025 Jul 8;53(13). doi: 10.1093/nar/gkaf614.
6
Multi-omics characterization of a lytic phage targeting .一种靶向……的裂解性噬菌体的多组学特征分析
mSystems. 2025 Jun 25:e0058725. doi: 10.1128/msystems.00587-25.
7
De-motif sampling: an approach to decompose hierarchical motifs with applications in T cell recognition.去基序采样:一种分解层次化基序的方法及其在T细胞识别中的应用
Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf221.
8
Common variation in meiosis genes shapes human recombination phenotypes and aneuploidy risk.减数分裂基因的常见变异塑造了人类重组表型和非整倍体风险。
medRxiv. 2025 Apr 4:2025.04.02.25325097. doi: 10.1101/2025.04.02.25325097.
9
metacp: a versatile software package for combining dependent or independent p-values.metacp:一个用于合并相关或独立p值的多功能软件包。
BMC Bioinformatics. 2025 Apr 19;26(1):109. doi: 10.1186/s12859-025-06126-z.
10
Transcriptomic atlas throughout Coccidioides development reveals key phase-enriched transcripts of this important fungal pathogen.球孢子菌发育过程中的转录组图谱揭示了这种重要真菌病原体关键阶段富集的转录本。
PLoS Biol. 2025 Apr 15;23(4):e3003066. doi: 10.1371/journal.pbio.3003066. eCollection 2025 Apr.