Suppr超能文献

无比对序列比较(I):统计学与效能

Alignment-free sequence comparison (I): statistics and power.

作者信息

Reinert Gesine, Chew David, Sun Fengzhu, Waterman Michael S

机构信息

Department of Statistics, University of Oxford, Oxford OX1 3TG, UK.

出版信息

J Comput Biol. 2009 Dec;16(12):1615-34. doi: 10.1089/cmb.2009.0198.

Abstract

Large-scale comparison of the similarities between two biological sequences is a major issue in computational biology; a fast method, the D(2) statistic, relies on the comparison of the k-tuple content for both sequences. Although it has been known for some years that the D(2) statistic is not suitable for this task, as it tends to be dominated by single-sequence noise, to date no suitable adjustments have been proposed. In this article, we suggest two new variants of the D(2) word count statistic, which we call D(2)(S) and D(2)(). For D(2)(S), which is a self-standardized statistic, we show that the statistic is asymptotically normally distributed, when sequence lengths tend to infinity, and not dominated by the noise in the individual sequences. The second statistic, D(2)(), outperforms D(2)(S) in terms of power for detecting the relatedness between the two sequences in our examples; but although it is straightforward to simulate from the asymptotic distribution of D(2)(*), we cannot provide a closed form for power calculations.

摘要

大规模比较两个生物序列之间的相似性是计算生物学中的一个主要问题;一种快速方法,即D(2)统计量,依赖于对两个序列的k元组含量进行比较。尽管多年来人们已经知道D(2)统计量不适合这项任务,因为它往往受单序列噪声的主导,但迄今为止尚未提出合适的调整方法。在本文中,我们提出了D(2)词计数统计量的两个新变体,我们称之为D(2)(S)和D(2)()。对于作为自标准化统计量的D(2)(S),我们表明当序列长度趋于无穷大时,该统计量渐近正态分布,且不受单个序列中噪声的主导。第二个统计量D(2)()在我们的示例中,在检测两个序列之间相关性的功效方面优于D(2)(S);但是尽管从D(2)(*)的渐近分布进行模拟很简单,但我们无法提供用于功效计算的封闭形式。

相似文献

引用本文的文献

8
Reference-free phylogeny from sequencing data.基于测序数据的无参考系统发育分析
BioData Min. 2023 Mar 27;16(1):13. doi: 10.1186/s13040-023-00329-x.
9
Bioinformatics approaches for unveiling virus-host interactions.用于揭示病毒-宿主相互作用的生物信息学方法。
Comput Struct Biotechnol J. 2023;21:1774-1784. doi: 10.1016/j.csbj.2023.02.044. Epub 2023 Feb 27.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验