• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于全宏基因组测序样本的基于内容的检索框架。

A Content-Based Retrieval Framework for Whole Metagenome Sequencing Samples.

作者信息

Şener Duygu Dede, Santoni Daniele, Felici Giovanni, Oğul Hasan

机构信息

Başkent University, Faculty of Engineering, Computer Engineering Department, Ankara, Turkey.

Institute of Systems Analysis and Computer Science "A. Ruberti", National Research Council, Rome, Italy.

出版信息

J Integr Bioinform. 2018 Oct 26;15(4):20170067. doi: 10.1515/jib-2017-0067.

DOI:10.1515/jib-2017-0067
PMID:30367805
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6348744/
Abstract

Finding similarities and differences between metagenomic samples within large repositories has been rather a significant issue for researchers. Over the recent years, content-based retrieval has been suggested by various studies from different perspectives. In this study, a content-based retrieval framework for identifying relevant metagenomic samples is developed. The framework consists of feature extraction, selection methods and similarity measures for whole metagenome sequencing samples. Performance of the developed framework was evaluated on given samples. A ground truth was used to evaluate the system performance such that if the system retrieves patients with the same disease, -called positive samples-, they are labeled as relevant samples otherwise irrelevant. The experimental results show that relevant experiments can be detected by using different fingerprinting approaches. We observed that Latent Semantic Analysis (LSA) Method is a promising fingerprinting approach for representing metagenomic samples and finding relevance among them. Source codes and executable files are available at www.baskent.edu.tr/∼hogul/WMS_retrieval.rar.

摘要

在大型数据库中寻找宏基因组样本之间的异同,对研究人员来说一直是个重大问题。近年来,不同的研究从不同角度提出了基于内容的检索方法。在本研究中,开发了一个用于识别相关宏基因组样本的基于内容的检索框架。该框架由全宏基因组测序样本的特征提取、选择方法和相似性度量组成。在所给样本上评估了所开发框架的性能。使用一个基本事实来评估系统性能,即如果系统检索到患有相同疾病的患者(即所谓的阳性样本),则将它们标记为相关样本,否则为不相关样本。实验结果表明,使用不同的指纹识别方法可以检测到相关实验。我们观察到,潜在语义分析(LSA)方法是一种很有前景的数据指纹识别方法,可用于表示宏基因组样本并找出它们之间的相关性。源代码和可执行文件可在www.baskent.edu.tr/∼hogul/WMS_retrieval.rar获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a954/6348744/230db5c587af/jib-15-20170067-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a954/6348744/4471c91ac3be/jib-15-20170067-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a954/6348744/e82e874055a4/jib-15-20170067-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a954/6348744/99da8d31a27c/jib-15-20170067-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a954/6348744/8fdce003aea0/jib-15-20170067-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a954/6348744/a2168a3cd81d/jib-15-20170067-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a954/6348744/450129b28068/jib-15-20170067-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a954/6348744/230db5c587af/jib-15-20170067-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a954/6348744/4471c91ac3be/jib-15-20170067-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a954/6348744/e82e874055a4/jib-15-20170067-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a954/6348744/99da8d31a27c/jib-15-20170067-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a954/6348744/8fdce003aea0/jib-15-20170067-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a954/6348744/a2168a3cd81d/jib-15-20170067-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a954/6348744/450129b28068/jib-15-20170067-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a954/6348744/230db5c587af/jib-15-20170067-g007.jpg

相似文献

1
A Content-Based Retrieval Framework for Whole Metagenome Sequencing Samples.一种用于全宏基因组测序样本的基于内容的检索框架。
J Integr Bioinform. 2018 Oct 26;15(4):20170067. doi: 10.1515/jib-2017-0067.
2
Exploiting topic modeling to boost metagenomic reads binning.利用主题建模来促进宏基因组读数分箱。
BMC Bioinformatics. 2015;16 Suppl 5(Suppl 5):S2. doi: 10.1186/1471-2105-16-S5-S2. Epub 2015 Mar 18.
3
Exploration and retrieval of whole-metagenome sequencing samples.全基因组测序样本的探索与检索。
Bioinformatics. 2014 Sep 1;30(17):2471-9. doi: 10.1093/bioinformatics/btu340. Epub 2014 May 19.
4
Metagenome Assembly and Contig Assignment.宏基因组组装与重叠群分配
Methods Mol Biol. 2018;1849:179-192. doi: 10.1007/978-1-4939-8728-3_12.
5
New approaches for metagenome assembly with short reads.基于短读长的宏基因组组装新方法
Brief Bioinform. 2020 Mar 23;21(2):584-594. doi: 10.1093/bib/bbz020.
6
Estimating the composition of species in metagenomes by clustering of next-generation read sequences.通过对新一代测序读段序列进行聚类来估计宏基因组中物种的组成。
Methods. 2014 Oct 1;69(3):213-9. doi: 10.1016/j.ymeth.2014.07.009. Epub 2014 Jul 27.
7
ViraPipe: scalable parallel pipeline for viral metagenome analysis from next generation sequencing reads.ViraPipe:用于从下一代测序读取中进行病毒宏基因组分析的可扩展并行管道。
Bioinformatics. 2018 Mar 15;34(6):928-935. doi: 10.1093/bioinformatics/btx702.
8
Effect of k-tuple length on sample-comparison with high-throughput sequencing data.k元组长度对高通量测序数据样本比较的影响。
Biochem Biophys Res Commun. 2016 Jan 22;469(4):1021-7. doi: 10.1016/j.bbrc.2015.11.094. Epub 2015 Dec 22.
9
Estimating the total genome length of a metagenomic sample using k-mers.利用 k- -mer 估算宏基因组样本的总基因组长度。
BMC Genomics. 2019 Apr 4;20(Suppl 2):183. doi: 10.1186/s12864-019-5467-x.
10
A Massively Parallel Sequence Similarity Search for Metagenomic Sequencing Data.宏基因组测序数据的大规模并行序列相似性搜索。
Int J Mol Sci. 2017 Oct 11;18(10):2124. doi: 10.3390/ijms18102124.

本文引用的文献

1
Assessment of k-mer spectrum applicability for metagenomic dissimilarity analysis.用于宏基因组差异分析的k-mer谱适用性评估。
BMC Bioinformatics. 2016 Jan 16;17:38. doi: 10.1186/s12859-015-0875-7.
2
Next generation sequencing reads comparison with an alignment-free distance.使用无比对距离的下一代测序读数比较
BMC Res Notes. 2014 Dec 3;7:869. doi: 10.1186/1756-0500-7-869.
3
Exploration and retrieval of whole-metagenome sequencing samples.全基因组测序样本的探索与检索。
Bioinformatics. 2014 Sep 1;30(17):2471-9. doi: 10.1093/bioinformatics/btu340. Epub 2014 May 19.
4
Compareads: comparing huge metagenomic experiments.Compareads:比较大型宏基因组实验。
BMC Bioinformatics. 2012;13 Suppl 19(Suppl 19):S10. doi: 10.1186/1471-2105-13-S19-S10. Epub 2012 Dec 19.
5
Comparison of metagenomic samples using sequence signatures.基于序列特征比较宏基因组样本。
BMC Genomics. 2012 Dec 27;13:730. doi: 10.1186/1471-2164-13-730.
6
A metagenome-wide association study of gut microbiota in type 2 diabetes.2 型糖尿病患者肠道微生物组的宏基因组关联研究。
Nature. 2012 Oct 4;490(7418):55-60. doi: 10.1038/nature11450. Epub 2012 Sep 26.
7
Meta-Storms: efficient search for similar microbial communities based on a novel indexing scheme and similarity score for metagenomic data.元风暴:基于新型索引方案和微生物组数据相似度评分的相似微生物群落高效搜索。
Bioinformatics. 2012 Oct 1;28(19):2493-501. doi: 10.1093/bioinformatics/bts470. Epub 2012 Jul 26.
8
Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega.使用 Clustal Omega 快速、可扩展地生成高质量蛋白质多重序列比对。
Mol Syst Biol. 2011 Oct 11;7:539. doi: 10.1038/msb.2011.75.
9
Sparse distance-based learning for simultaneous multiclass classification and feature selection of metagenomic data.基于稀疏距离的宏基因组数据同时多类分类和特征选择学习。
Bioinformatics. 2011 Dec 1;27(23):3242-9. doi: 10.1093/bioinformatics/btr547. Epub 2011 Oct 7.
10
Metagenomic biomarker discovery and explanation.宏基因组生物标志物发现与阐释。
Genome Biol. 2011 Jun 24;12(6):R60. doi: 10.1186/gb-2011-12-6-r60.