• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用生长自组织映射改进环境全基因组鸟枪法测序中的分箱过程。

Using growing self-organising maps to improve the binning process in environmental whole-genome shotgun sequencing.

作者信息

Chan Chon-Kit Kenneth, Hsu Arthur L, Tang Sen-Lin, Halgamuge Saman K

机构信息

Dynamic Systems & Control Group, Department of Mechanical Engineering, University of Melbourne, VIC 3010, Australia.

出版信息

J Biomed Biotechnol. 2008;2008:513701. doi: 10.1155/2008/513701.

DOI:10.1155/2008/513701
PMID:18288261
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2235928/
Abstract

Metagenomic projects using whole-genome shotgun (WGS) sequencing produces many unassembled DNA sequences and small contigs. The step of clustering these sequences, based on biological and molecular features, is called binning. A reported strategy for binning that combines oligonucleotide frequency and self-organising maps (SOM) shows high potential. We improve this strategy by identifying suitable training features, implementing a better clustering algorithm, and defining quantitative measures for assessing results. We investigated the suitability of each of di-, tri-, tetra-, and pentanucleotide frequencies. The results show that dinucleotide frequency is not a sufficiently strong signature for binning 10 kb long DNA sequences, compared to the other three. Furthermore, we observed that increased order of oligonucleotide frequency may deteriorate the assignment result in some cases, which indicates the possible existence of optimal species-specific oligonucleotide frequency. We replaced SOM with growing self-organising map (GSOM) where comparable results are obtained while gaining 7%-15% speed improvement.

摘要

使用全基因组鸟枪法(WGS)测序的宏基因组项目会产生许多未组装的DNA序列和小的重叠群。基于生物学和分子特征对这些序列进行聚类的步骤称为分箱。一种将寡核苷酸频率和自组织映射(SOM)相结合的分箱策略显示出很高的潜力。我们通过识别合适的训练特征、实施更好的聚类算法以及定义评估结果的定量指标来改进这一策略。我们研究了二核苷酸、三核苷酸、四核苷酸和五核苷酸频率各自的适用性。结果表明,与其他三种相比,二核苷酸频率对于10 kb长的DNA序列分箱来说,不是一个足够强大的特征。此外,我们观察到在某些情况下,寡核苷酸频率阶数的增加可能会使分类结果变差,这表明可能存在最优的物种特异性寡核苷酸频率。我们用生长自组织映射(GSOM)取代了SOM,在获得可比结果的同时,速度提高了7%-15%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2af2/2235928/fceb18eac5bc/JBB2008-513701.005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2af2/2235928/2d983ebe1ca8/JBB2008-513701.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2af2/2235928/bcca64c881a9/JBB2008-513701.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2af2/2235928/01fa6ea798c1/JBB2008-513701.003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2af2/2235928/98f977d2f0cd/JBB2008-513701.004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2af2/2235928/fceb18eac5bc/JBB2008-513701.005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2af2/2235928/2d983ebe1ca8/JBB2008-513701.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2af2/2235928/bcca64c881a9/JBB2008-513701.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2af2/2235928/01fa6ea798c1/JBB2008-513701.003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2af2/2235928/98f977d2f0cd/JBB2008-513701.004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2af2/2235928/fceb18eac5bc/JBB2008-513701.005.jpg

相似文献

1
Using growing self-organising maps to improve the binning process in environmental whole-genome shotgun sequencing.使用生长自组织映射改进环境全基因组鸟枪法测序中的分箱过程。
J Biomed Biotechnol. 2008;2008:513701. doi: 10.1155/2008/513701.
2
Binning sequences using very sparse labels within a metagenome.在宏基因组内使用非常稀疏的标签对序列进行分箱。
BMC Bioinformatics. 2008 Apr 28;9:215. doi: 10.1186/1471-2105-9-215.
3
Gene prediction in metagenomic fragments: a large scale machine learning approach.宏基因组片段中的基因预测:一种大规模机器学习方法。
BMC Bioinformatics. 2008 Apr 28;9:217. doi: 10.1186/1471-2105-9-217.
4
ProSOM: core promoter prediction based on unsupervised clustering of DNA physical profiles.ProSOM:基于DNA物理图谱无监督聚类的核心启动子预测
Bioinformatics. 2008 Jul 1;24(13):i24-31. doi: 10.1093/bioinformatics/btn172.
5
Clustering and re-clustering for pattern discovery in gene expression data.用于基因表达数据中模式发现的聚类和再聚类。
J Bioinform Comput Biol. 2005 Apr;3(2):281-301. doi: 10.1142/s0219720005001053.
6
A segmentation/clustering model for the analysis of array CGH data.一种用于分析阵列比较基因组杂交(array CGH)数据的分割/聚类模型。
Biometrics. 2007 Sep;63(3):758-66. doi: 10.1111/j.1541-0420.2006.00729.x.
7
On the equivalence between kernel self-organising maps and self-organising mixture density networks.关于核自组织映射与自组织混合密度网络之间的等价性
Neural Netw. 2006 Jul-Aug;19(6-7):780-4. doi: 10.1016/j.neunet.2006.05.007. Epub 2006 Jun 6.
8
Hyperbolic SOM-based clustering of DNA fragment features for taxonomic visualization and classification.基于双曲自组织映射的DNA片段特征聚类用于分类可视化和分类。
Bioinformatics. 2008 Jul 15;24(14):1568-74. doi: 10.1093/bioinformatics/btn257. Epub 2008 Jun 5.
9
Compressed pattern matching in DNA sequences.DNA序列中的压缩模式匹配
Proc IEEE Comput Syst Bioinform Conf. 2004:62-8. doi: 10.1109/csb.2004.1332418.
10
A novel bioinformatic strategy for unveiling hidden genome signatures of eukaryotes: self-organizing map of oligonucleotide frequency.一种揭示真核生物隐藏基因组特征的新型生物信息学策略:寡核苷酸频率的自组织映射图。
Genome Inform. 2002;13:12-20.

引用本文的文献

1
Viral population analysis of the taiga tick, Ixodes persulcatus, by using Batch Learning Self-Organizing Maps and BLAST search.利用批量学习自组织映射和BLAST搜索对远东硬蜱(全沟硬蜱)进行病毒种群分析。
J Vet Med Sci. 2019 Mar 20;81(3):401-410. doi: 10.1292/jvms.18-0483. Epub 2019 Jan 23.
2
Cnidaria: fast, reference-free clustering of raw and assembled genome and transcriptome NGS data.刺胞动物门:原始和组装的基因组及转录组二代测序数据的快速、无参考聚类
BMC Bioinformatics. 2015 Nov 2;16:352. doi: 10.1186/s12859-015-0806-7.
3
Development of self-compressing BLSOM for comprehensive analysis of big sequence data.

本文引用的文献

1
Dynamic self-organizing maps with controlled growth for knowledge discovery.用于知识发现的具有可控增长的动态自组织映射
IEEE Trans Neural Netw. 2000;11(3):601-14. doi: 10.1109/72.846732.
2
Use of simulated data sets to evaluate the fidelity of metagenomic processing methods.使用模拟数据集评估宏基因组学处理方法的保真度。
Nat Methods. 2007 Jun;4(6):495-500. doi: 10.1038/nmeth1043. Epub 2007 Apr 29.
3
Environmental shotgun sequencing: its potential and challenges for studying the hidden world of microbes.环境鸟枪法测序:研究微生物隐秘世界的潜力与挑战
用于大序列数据综合分析的自压缩BLSOM的开发。
Biomed Res Int. 2015;2015:506052. doi: 10.1155/2015/506052. Epub 2015 Oct 1.
4
Metagenomics: tools and insights for analyzing next-generation sequencing data derived from biodiversity studies.宏基因组学:用于分析源自生物多样性研究的下一代测序数据的工具与见解。
Bioinform Biol Insights. 2015 May 5;9:75-88. doi: 10.4137/BBI.S12462. eCollection 2015.
5
Reconstructing the genomic content of microbiome taxa through shotgun metagenomic deconvolution.通过高通量宏基因组去卷积重建微生物组分类群的基因组内容。
PLoS Comput Biol. 2013;9(10):e1003292. doi: 10.1371/journal.pcbi.1003292. Epub 2013 Oct 17.
6
A novel approach, based on BLSOMs (Batch Learning Self-Organizing Maps), to the microbiome analysis of ticks.一种基于 BLSOMs(批量学习自组织映射)的蜱虫微生物组分析新方法。
ISME J. 2013 May;7(5):1003-15. doi: 10.1038/ismej.2012.171. Epub 2013 Jan 10.
7
Current opportunities and challenges in microbial metagenome analysis--a bioinformatic perspective.微生物宏基因组分析的当前机遇与挑战——从生物信息学角度来看。
Brief Bioinform. 2012 Nov;13(6):728-42. doi: 10.1093/bib/bbs039. Epub 2012 Sep 9.
8
Joint analysis of multiple metagenomic samples.多组宏基因组样本的联合分析。
PLoS Comput Biol. 2012;8(2):e1002373. doi: 10.1371/journal.pcbi.1002373. Epub 2012 Feb 16.
9
Evaluation of short read metagenomic assembly.短读宏基因组组装评估。
BMC Genomics. 2011;12 Suppl 2(Suppl 2):S8. doi: 10.1186/1471-2164-12-S2-S8. Epub 2011 Jul 27.
10
Multistrategy self-organizing map learning for classification problems.多策略自组织映射学习在分类问题中的应用。
Comput Intell Neurosci. 2011;2011:121787. doi: 10.1155/2011/121787. Epub 2011 Aug 16.
PLoS Biol. 2007 Mar;5(3):e82. doi: 10.1371/journal.pbio.0050082.
4
The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific.“魔法师二号”全球海洋采样探险:从西北大西洋到东热带太平洋
PLoS Biol. 2007 Mar;5(3):e77. doi: 10.1371/journal.pbio.0050077.
5
The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families.“魔法师二号”全球海洋采样考察:拓展蛋白质家族的范畴
PLoS Biol. 2007 Mar;5(3):e16. doi: 10.1371/journal.pbio.0050016.
6
Accurate phylogenetic classification of variable-length DNA fragments.可变长度DNA片段的精确系统发育分类。
Nat Methods. 2007 Jan;4(1):63-72. doi: 10.1038/nmeth976. Epub 2006 Dec 10.
7
Symbiosis insights through metagenomic analysis of a microbial consortium.通过对微生物群落的宏基因组分析获得的共生见解。
Nature. 2006 Oct 26;443(7114):950-5. doi: 10.1038/nature05192. Epub 2006 Sep 17.
8
Improving biomolecular pattern discovery and visualization with hybrid self-adaptive networks.利用混合自适应网络改进生物分子模式发现与可视化
IEEE Trans Nanobioscience. 2002 Dec;1(4):146-66. doi: 10.1109/tnb.2003.809465.
9
Self-Organizing Map (SOM) unveils and visualizes hidden sequence characteristics of a wide range of eukaryote genomes.自组织映射(SOM)揭示并可视化了多种真核生物基因组的隐藏序列特征。
Gene. 2006 Jan 3;365:27-34. doi: 10.1016/j.gene.2005.09.040. Epub 2005 Dec 20.
10
Bioinformatics for whole-genome shotgun sequencing of microbial communities.用于微生物群落全基因组鸟枪法测序的生物信息学
PLoS Comput Biol. 2005 Jul;1(2):106-12. doi: 10.1371/journal.pcbi.0010024.