• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

不同基因组中n聚体的出现情况有多独立?

How independent are the appearances of n-mers in different genomes?

作者信息

Fofanov Yuriy, Luo Yi, Katili Charles, Wang Jim, Belosludtsev Yuri, Powdrill Thomas, Belapurkar Chetan, Fofanov Viacheslav, Li Tong-Bin, Chumakov Sergey, Pettitt B Montgomery

机构信息

Department of Computer Science, University of Houston, TX 77204-3010, USA.

出版信息

Bioinformatics. 2004 Oct 12;20(15):2421-8. doi: 10.1093/bioinformatics/bth266. Epub 2004 Apr 15.

DOI:10.1093/bioinformatics/bth266
PMID:15087315
Abstract

MOTIVATION

Analysis of statistical properties of DNA sequences is important for evolutional biology as well as for DNA probe and PCR technologies. These technologies, in turn, can be used for organism identification, which implies applications in the diagnosis of infectious diseases, environmental studies, etc.

RESULTS

We present results of the correlation analysis of distributions of the presence/absence of short nucleotide subsequences of different length ('n-mers', n = 5-20) in more than 1500 microbial and virus genomes, together with five genomes of multicellular organisms (including human). We calculate whether a given n-mer is present or absent (frequency of presence) in a given genome, which is not the usually calculated number of appearances of n-mers in one or more genomes (frequency of appearance). For organisms that are not close relatives of each other, the presence/absence of different 7-20mers in their genomes are not correlated. For close biological relatives, some correlation of the presence of n-mers in this range appears, but is not as strong as expected. Suppressed correlations among the n-mers present in different genomes leads to the possibility of using random sets of n-mers (with appropriately chosen n) to discriminate genomes of different organisms and possibly individual genomes of the same species including human with a low probability of error.

摘要

动机

对DNA序列的统计特性进行分析,对于进化生物学以及DNA探针和聚合酶链式反应(PCR)技术而言都很重要。反过来,这些技术可用于生物体鉴定,这意味着可应用于传染病诊断、环境研究等领域。

结果

我们展示了对1500多个微生物和病毒基因组以及五个多细胞生物(包括人类)基因组中不同长度(“n聚体”,n = 5 - 20)的短核苷酸子序列的存在/缺失分布进行相关性分析的结果。我们计算给定的n聚体在给定基因组中是存在还是不存在(存在频率),这与通常计算的n聚体在一个或多个基因组中的出现次数(出现频率)不同。对于彼此关系不密切的生物体,它们基因组中不同的7 - 20聚体的存在/缺失不相关。对于亲缘关系密切的生物体,该范围内n聚体的存在存在一定相关性,但不如预期的强。不同基因组中存在的n聚体之间的相关性受到抑制,这使得有可能使用随机的n聚体集合(n选择合适)来区分不同生物体的基因组,甚至有可能以较低的错误概率区分同一物种(包括人类)的个体基因组。

相似文献

1
How independent are the appearances of n-mers in different genomes?不同基因组中n聚体的出现情况有多独立?
Bioinformatics. 2004 Oct 12;20(15):2421-8. doi: 10.1093/bioinformatics/bth266. Epub 2004 Apr 15.
2
GeneSyn: a tool for detecting conserved gene order across genomes.基因同步(GeneSyn):一种用于检测跨基因组保守基因顺序的工具。
Bioinformatics. 2004 Jun 12;20(9):1472-4. doi: 10.1093/bioinformatics/bth102. Epub 2004 Feb 19.
3
Multiple organism algorithm for finding ultraconserved elements.用于寻找超保守元件的多生物算法。
BMC Bioinformatics. 2008 Jan 11;9:15. doi: 10.1186/1471-2105-9-15.
4
Comparative annotation of viral genomes with non-conserved gene structure.具有非保守基因结构的病毒基因组的比较注释
Bioinformatics. 2007 May 1;23(9):1080-9. doi: 10.1093/bioinformatics/btm078. Epub 2007 Mar 6.
5
Statistical power of phylo-HMM for evolutionarily conserved element detection.用于检测进化保守元件的系统发育隐马尔可夫模型的统计功效。
BMC Bioinformatics. 2007 Oct 5;8:374. doi: 10.1186/1471-2105-8-374.
6
Shannon information in complete genomes.完整基因组中的香农信息。
J Bioinform Comput Biol. 2005 Jun;3(3):587-608. doi: 10.1142/s0219720005001181.
7
OMA Browser--exploring orthologous relations across 352 complete genomes.OMA浏览器——探索352个完整基因组间的直系同源关系。
Bioinformatics. 2007 Aug 15;23(16):2180-2. doi: 10.1093/bioinformatics/btm295. Epub 2007 Jun 1.
8
The distribution of distances between randomly constructed genomes: generating function, expectation, variance and limits.随机构建基因组间距离的分布:生成函数、期望、方差及极限
J Bioinform Comput Biol. 2008 Feb;6(1):23-36. doi: 10.1142/s0219720008003254.
9
Poisson adjacency distributions in genome comparison: multichromosomal, circular, signed and unsigned cases.基因组比较中的泊松邻接分布:多染色体、环状、带符号和无符号情况。
Bioinformatics. 2008 Aug 15;24(16):i146-52. doi: 10.1093/bioinformatics/btn295.
10
Efficient multiple genome alignment.高效多基因组比对。
Bioinformatics. 2002;18 Suppl 1:S312-20. doi: 10.1093/bioinformatics/18.suppl_1.s312.

引用本文的文献

1
K-mer-based Approaches to Bridging Pangenomics and Population Genetics.基于K-mer的泛基因组学与群体遗传学关联方法。
Mol Biol Evol. 2025 Mar 5;42(3). doi: 10.1093/molbev/msaf047.
2
-mer approaches for biodiversity genomics.用于生物多样性基因组学的-mer方法。
Genome Res. 2025 Feb 14;35(2):219-230. doi: 10.1101/gr.279452.124.
3
Previously unmeasured genetic diversity explains part of Lewontin's paradox in a -mer-based meta-analysis of 112 plant species.在一项基于112种植物物种的-mer元分析中,先前未测量的遗传多样性解释了部分莱翁汀悖论。
bioRxiv. 2024 Sep 8:2024.05.17.594778. doi: 10.1101/2024.05.17.594778.
4
The determinants of the rarity of nucleic and peptide short sequences in nature.自然界中核酸和肽短序列稀有性的决定因素。
NAR Genom Bioinform. 2024 Apr 4;6(2):lqae029. doi: 10.1093/nargab/lqae029. eCollection 2024 Jun.
5
Comparison of k-mer-based comparative metagenomic tools and approaches.基于k-mer的比较宏基因组学工具和方法的比较。
Microbiome Res Rep. 2023 Jul 20;2(4):27. doi: 10.20517/mrr.2023.26. eCollection 2023.
6
RabbitTClust: enabling fast clustering analysis of millions of bacteria genomes with MinHash sketches.RabbitTClust:使用 MinHash 草图实现对数百万个细菌基因组的快速聚类分析。
Genome Biol. 2023 May 17;24(1):121. doi: 10.1186/s13059-023-02961-6.
7
Proving sequence aligners can guarantee accuracy in almost ( log ) time through an average-case analysis of the seed-chain-extend heuristic.通过对种子链扩展启发式算法的平均情况分析,证明序列比对器可以在几乎(log)时间内保证准确性。
Genome Res. 2023 Jul;33(7):1175-1187. doi: 10.1101/gr.277637.122. Epub 2023 Mar 29.
8
Identifying individual-specific microbial DNA fingerprints from skin microbiomes.从皮肤微生物群中识别个体特异性的微生物DNA指纹。
Front Microbiol. 2022 Oct 6;13:960043. doi: 10.3389/fmicb.2022.960043. eCollection 2022.
9
Deep learning of a bacterial and archaeal universal language of life enables transfer learning and illuminates microbial dark matter.深度学习细菌和古菌的生命通用语言能够实现迁移学习并照亮微生物暗物质。
Nat Commun. 2022 May 11;13(1):2606. doi: 10.1038/s41467-022-30070-8.
10
STAT: a fast, scalable, MinHash-based k-mer tool to assess Sequence Read Archive next-generation sequence submissions.STAT:一种快速、可扩展的基于 MinHash 的 k-mer 工具,用于评估 Sequence Read Archive 下一代序列提交。
Genome Biol. 2021 Sep 20;22(1):270. doi: 10.1186/s13059-021-02490-0.