• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种基于词频逆文档频率(TF-IDF)的用于检测横向基因转移的新型无比对方法。

A novel alignment-free method for detection of lateral genetic transfer based on TF-IDF.

作者信息

Cong Yingnan, Chan Yao-Ban, Ragan Mark A

机构信息

Institute for Molecular Bioscience and ARC Centre of Excellence in Bioinformatics, The University of Queensland, St Lucia, Brisbane, QLD 4072, Australia.

School of Mathematics and Statistics, The University of Melbourne, Parkville, Melbourne, VIC 3010, Australia.

出版信息

Sci Rep. 2016 Jul 25;6:30308. doi: 10.1038/srep30308.

DOI:10.1038/srep30308
PMID:27453035
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4958984/
Abstract

Lateral genetic transfer (LGT) plays an important role in the evolution of microbes. Existing computational methods for detecting genomic regions of putative lateral origin scale poorly to large data. Here, we propose a novel method based on TF-IDF (Term Frequency-Inverse Document Frequency) statistics to detect not only regions of lateral origin, but also their origin and direction of transfer, in sets of hierarchically structured nucleotide or protein sequences. This approach is based on the frequency distributions of k-mers in the sequences. If a set of contiguous k-mers appears sufficiently more frequently in another phyletic group than in its own, we infer that they have been transferred from the first group to the second. We performed rigorous tests of TF-IDF using simulated and empirical datasets. With the simulated data, we tested our method under different parameter settings for sequence length, substitution rate between and within groups and post-LGT, deletion rate, length of transferred region and k size, and found that we can detect LGT events with high precision and recall. Our method performs better than an established method, ALFY, which has high recall but low precision. Our method is efficient, with runtime increasing approximately linearly with sequence length.

摘要

横向基因转移(LGT)在微生物进化中起着重要作用。现有的用于检测假定横向起源基因组区域的计算方法在处理大数据时扩展性较差。在此,我们提出一种基于词频 - 逆文档频率(TF-IDF)统计的新方法,用于在分层结构的核苷酸或蛋白质序列集中不仅检测横向起源区域,还能检测其起源和转移方向。该方法基于序列中k-mer的频率分布。如果一组连续的k-mer在另一个系统发育组中出现的频率明显高于其自身所在组,我们就推断它们是从第一个组转移到了第二个组。我们使用模拟数据集和实证数据集对TF-IDF进行了严格测试。对于模拟数据,我们在不同参数设置下测试了我们的方法,这些参数包括序列长度、组间和组内替换率以及横向基因转移后的情况、缺失率、转移区域长度和k值大小,结果发现我们能够以高精度和召回率检测横向基因转移事件。我们的方法比已有的ALFY方法表现更好,ALFY召回率高但精度低。我们的方法效率高,运行时间随序列长度近似线性增加。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea88/4958984/faefe3f2468c/srep30308-f11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea88/4958984/c2803edc4a95/srep30308-f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea88/4958984/fecfb6269baa/srep30308-f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea88/4958984/52c868e05d13/srep30308-f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea88/4958984/902ee890ce51/srep30308-f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea88/4958984/1a971af4c6e2/srep30308-f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea88/4958984/2ecb957ec9a3/srep30308-f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea88/4958984/9d79cf9cb3a8/srep30308-f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea88/4958984/4e2e5b44dfb0/srep30308-f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea88/4958984/a8c10c6d8c3b/srep30308-f9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea88/4958984/92b3bdb15bc9/srep30308-f10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea88/4958984/faefe3f2468c/srep30308-f11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea88/4958984/c2803edc4a95/srep30308-f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea88/4958984/fecfb6269baa/srep30308-f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea88/4958984/52c868e05d13/srep30308-f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea88/4958984/902ee890ce51/srep30308-f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea88/4958984/1a971af4c6e2/srep30308-f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea88/4958984/2ecb957ec9a3/srep30308-f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea88/4958984/9d79cf9cb3a8/srep30308-f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea88/4958984/4e2e5b44dfb0/srep30308-f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea88/4958984/a8c10c6d8c3b/srep30308-f9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea88/4958984/92b3bdb15bc9/srep30308-f10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea88/4958984/faefe3f2468c/srep30308-f11.jpg

相似文献

1
A novel alignment-free method for detection of lateral genetic transfer based on TF-IDF.一种基于词频逆文档频率(TF-IDF)的用于检测横向基因转移的新型无比对方法。
Sci Rep. 2016 Jul 25;6:30308. doi: 10.1038/srep30308.
2
Exploring lateral genetic transfer among microbial genomes using TF-IDF.利用 TF-IDF 探索微生物基因组之间的侧向基因转移。
Sci Rep. 2016 Jul 25;6:29319. doi: 10.1038/srep29319.
3
Detecting lateral genetic transfer : a phylogenetic approach.检测横向基因转移:一种系统发育方法。
Methods Mol Biol. 2008;452:457-69. doi: 10.1007/978-1-60327-159-2_21.
4
Robust Inference of Genetic Exchange Communities from Microbial Genomes Using TF-IDF.使用词频-逆文档频率从微生物基因组中对基因交换群落进行稳健推断。
Front Microbiol. 2017 Jan 19;8:21. doi: 10.3389/fmicb.2017.00021. eCollection 2017.
5
Probabilistic inference of lateral gene transfer events.横向基因转移事件的概率推断
BMC Bioinformatics. 2016 Nov 11;17(Suppl 14):431. doi: 10.1186/s12859-016-1268-2.
6
Scaling Up the Phylogenetic Detection of Lateral Gene Transfer Events.扩大横向基因转移事件的系统发育检测规模
Methods Mol Biol. 2017;1525:421-432. doi: 10.1007/978-1-4939-6622-6_16.
7
Generation of Level- k LGT Networks.第k级水平基因横向转移网络的生成。
IEEE/ACM Trans Comput Biol Bioinform. 2020 Jan-Feb;17(1):158-164. doi: 10.1109/TCBB.2019.2895344. Epub 2019 Jan 25.
8
Detection of lateral gene transfer events in the prokaryotic tRNA synthetases by the ratios of evolutionary distances method.通过进化距离比率法检测原核生物tRNA合成酶中的横向基因转移事件。
J Mol Evol. 2004 May;58(5):615-31. doi: 10.1007/s00239-004-2582-2.
9
Alignment-free microbial phylogenomics under scenarios of sequence divergence, genome rearrangement and lateral genetic transfer.无比对的微生物系统发生基因组学研究在序列分歧、基因组重排和水平基因转移情景下的应用。
Sci Rep. 2016 Jul 1;6:28970. doi: 10.1038/srep28970.
10
Ancestral genome sizes specify the minimum rate of lateral gene transfer during prokaryote evolution.祖先基因组大小决定了原核生物进化过程中横向基因转移的最低速率。
Proc Natl Acad Sci U S A. 2007 Jan 16;104(3):870-5. doi: 10.1073/pnas.0606318104. Epub 2007 Jan 9.

引用本文的文献

1
GRAMEP: an alignment-free method based on the maximum entropy principle for identifying SNPs.GRAMEP:一种基于最大熵原理的无比对单核苷酸多态性识别方法。
BMC Bioinformatics. 2025 Feb 25;26(1):66. doi: 10.1186/s12859-025-06037-z.
2
Current state and future prospects of Horizontal Gene Transfer detection.水平基因转移检测的现状与未来展望
NAR Genom Bioinform. 2025 Feb 11;7(1):lqaf005. doi: 10.1093/nargab/lqaf005. eCollection 2025 Mar.
3
Genetic Transfer in Action: Uncovering DNA Flow in an Extremophilic Microbial Community.

本文引用的文献

1
Exploring lateral genetic transfer among microbial genomes using TF-IDF.利用 TF-IDF 探索微生物基因组之间的侧向基因转移。
Sci Rep. 2016 Jul 25;6:29319. doi: 10.1038/srep29319.
2
Blue: correcting sequencing errors using consensus and context.蓝色:使用一致性和上下文来纠正测序错误。
Bioinformatics. 2014 Oct;30(19):2723-32. doi: 10.1093/bioinformatics/btu368. Epub 2014 Jun 11.
3
The purity measure for genomic regions leads to horizontally transferred genes.基因组区域的纯度测量会导致水平转移基因。
实际中的基因转移:揭示极端微生物群落中的DNA流动
Environ Microbiol. 2025 Feb;27(2):e70048. doi: 10.1111/1462-2920.70048.
4
What Do We Gain When Tolerating Loss? The Information Bottleneck Wrings Out Recombination.容忍损失时我们能获得什么?信息瓶颈消除了重组。
Mol Biol Evol. 2025 Mar 5;42(3). doi: 10.1093/molbev/msaf029.
5
A mapping-free natural language processing-based technique for sequence search in nanopore long-reads.一种基于无映射自然语言处理的技术,用于在纳米孔长读中进行序列搜索。
BMC Bioinformatics. 2024 Nov 13;25(1):354. doi: 10.1186/s12859-024-05980-7.
6
Evolutionary Processes Driving the Rise and Fall of ST239, a Dominant Hybrid Pathogen.推动 ST239 强势杂交病原体兴衰的进化过程。
mBio. 2021 Dec 21;12(6):e0216821. doi: 10.1128/mBio.02168-21. Epub 2021 Dec 14.
7
INSIDER: alignment-free detection of foreign DNA sequences.INSIDER:无比对检测外源DNA序列
Comput Struct Biotechnol J. 2021 Jun 29;19:3810-3816. doi: 10.1016/j.csbj.2021.06.045. eCollection 2021.
8
An Integrative Computational Approach for the Prediction of Human- Protein-Protein Interactions.一种用于预测人-蛋白质-蛋白质相互作用的综合计算方法。
Biomed Res Int. 2020 Dec 19;2020:2082540. doi: 10.1155/2020/2082540. eCollection 2020.
9
Patterns, Profiles, and Parsimony: Dissecting Transcriptional Signatures From Minimal Single-Cell RNA-Seq Output With SALSA.模式、概况与简约性:利用SALSA从最小单细胞RNA测序输出中剖析转录特征
Front Genet. 2020 Oct 9;11:511286. doi: 10.3389/fgene.2020.511286. eCollection 2020.
10
What Has Been Trending in the Research of Polyhydroxyalkanoates? A Systematic Review.聚羟基脂肪酸酯研究的热点有哪些?一项系统综述。
Front Bioeng Biotechnol. 2020 Sep 10;8:959. doi: 10.3389/fbioe.2020.00959. eCollection 2020.
J Bioinform Comput Biol. 2013 Dec;11(6):1343002. doi: 10.1142/S0219720013430026. Epub 2013 Dec 2.
4
The distribution of word matches between Markovian sequences with periodic boundary conditions.具有周期性边界条件的马尔可夫序列之间单词匹配的分布。
J Comput Biol. 2014 Jan;21(1):41-63. doi: 10.1089/cmb.2012.0277. Epub 2013 Oct 26.
5
Alignment-free detection of horizontal gene transfer between closely related bacterial genomes.密切相关细菌基因组间水平基因转移的无比对检测
Mob Genet Elements. 2011 Sep;1(3):230-235. doi: 10.4161/mge.1.3.18065. Epub 2011 Sep 1.
6
ALF--a simulation framework for genome evolution.ALF--一个用于基因组进化的模拟框架。
Mol Biol Evol. 2012 Apr;29(4):1115-23. doi: 10.1093/molbev/msr268. Epub 2011 Dec 8.
7
Alignment-free detection of local similarity among viral and bacterial genomes.基于比对的病毒和细菌基因组之间局部相似性的检测。
Bioinformatics. 2011 Jun 1;27(11):1466-72. doi: 10.1093/bioinformatics/btr176. Epub 2011 Apr 6.
8
Directed networks reveal genomic barriers and DNA repair bypasses to lateral gene transfer among prokaryotes.定向网络揭示了原核生物之间水平基因转移的基因组障碍和 DNA 修复旁路。
Genome Res. 2011 Apr;21(4):599-609. doi: 10.1101/gr.115592.110. Epub 2011 Jan 26.
9
Lateral genetic transfer and the construction of genetic exchange communities.侧向基因转移与遗传交流群体的构建。
FEMS Microbiol Rev. 2011 Sep;35(5):707-35. doi: 10.1111/j.1574-6976.2010.00261.x. Epub 2011 Jan 21.
10
A fast, lock-free approach for efficient parallel counting of occurrences of k-mers.一种快速、无锁的方法,用于高效并行计数 k-mer 的出现次数。
Bioinformatics. 2011 Mar 15;27(6):764-70. doi: 10.1093/bioinformatics/btr011. Epub 2011 Jan 7.