• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于发现同源蛋白簇的密度参数估计——追踪放线菌的致病性生活方式。

Density parameter estimation for finding clusters of homologous proteins--tracing actinobacterial pathogenicity lifestyles.

机构信息

Max Planck Institute for Informatics, Saarland University, 66123 Saarbrücken, Germany.

出版信息

Bioinformatics. 2013 Jan 15;29(2):215-22. doi: 10.1093/bioinformatics/bts653. Epub 2012 Nov 9.

DOI:10.1093/bioinformatics/bts653
PMID:23142964
Abstract

MOTIVATION

Homology detection is a long-standing challenge in computational biology. To tackle this problem, typically all-versus-all BLAST results are coupled with data partitioning approaches resulting in clusters of putative homologous proteins. One of the main problems, however, has been widely neglected: all clustering tools need a density parameter that adjusts the number and size of the clusters. This parameter is crucial but hard to estimate without gold standard data at hand. Developing a gold standard, however, is a difficult and time consuming task. Having a reliable method for detecting clusters of homologous proteins between a huge set of species would open opportunities for better understanding the genetic repertoire of bacteria with different lifestyles.

RESULTS

Our main contribution is a method for identifying a suitable and robust density parameter for protein homology detection without a given gold standard. Therefore, we study the core genome of 89 actinobacteria. This allows us to incorporate background knowledge, i.e. the assumption that a set of evolutionarily closely related species should share a comparably high number of evolutionarily conserved proteins (emerging from phylum-specific housekeeping genes). We apply our strategy to find genes/proteins that are specific for certain actinobacterial lifestyles, i.e. different types of pathogenicity. The whole study was performed with transitivity clustering, as it only requires a single intuitive density parameter and has been shown to be well applicable for the task of protein sequence clustering. Note, however, that the presented strategy generally does not depend on our clustering method but can easily be adapted to other clustering approaches.

AVAILABILITY

All results are publicly available at http://transclust.mmci.uni-saarland.de/actino_core/ or as Supplementary Material of this article.

CONTACT

roettger@mpi-inf.mpg.de

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

同源性检测是计算生物学中的一个长期存在的挑战。为了解决这个问题,通常将所有与所有 BLAST 结果与数据分区方法相结合,从而产生假定同源蛋白的聚类。然而,其中一个主要问题一直被广泛忽视:所有聚类工具都需要一个密度参数来调整聚类的数量和大小。这个参数是至关重要的,但在没有手头的黄金标准数据的情况下很难估计。然而,开发黄金标准是一项困难且耗时的任务。拥有一种可靠的方法来检测大量物种之间同源蛋白的聚类,将为更好地理解具有不同生活方式的细菌的遗传组成提供机会。

结果

我们的主要贡献是一种在没有给定黄金标准的情况下识别蛋白质同源性检测合适且稳健的密度参数的方法。因此,我们研究了 89 种放线菌的核心基因组。这使我们能够整合背景知识,即一组进化上密切相关的物种应该共享相对较高数量的进化保守蛋白(源自门特异性的管家基因)。我们应用我们的策略来寻找特定放线菌生活方式(即不同类型的致病性)特有的基因/蛋白。整个研究使用传递聚类来完成,因为它只需要一个单一的直观密度参数,并且已经证明它非常适用于蛋白质序列聚类的任务。请注意,然而,所提出的策略通常不依赖于我们的聚类方法,但可以轻松适应其他聚类方法。

可用性

所有结果均可在 http://transclust.mmci.uni-saarland.de/actino_core/ 或本文的补充材料中获得。

联系人

roettger@mpi-inf.mpg.de

补充信息

补充数据可在生物信息学在线获得。

相似文献

1
Density parameter estimation for finding clusters of homologous proteins--tracing actinobacterial pathogenicity lifestyles.用于发现同源蛋白簇的密度参数估计——追踪放线菌的致病性生活方式。
Bioinformatics. 2013 Jan 15;29(2):215-22. doi: 10.1093/bioinformatics/bts653. Epub 2012 Nov 9.
2
Phylogenetic analyses of phylum Actinobacteria based on whole genome sequences.基于全基因组序列的厚壁菌门系统发育分析。
Res Microbiol. 2013 Sep;164(7):718-28. doi: 10.1016/j.resmic.2013.04.002. Epub 2013 Apr 19.
3
A hybrid clustering approach to recognition of protein families in 114 microbial genomes.一种用于识别114个微生物基因组中蛋白质家族的混合聚类方法。
BMC Bioinformatics. 2004 Apr 29;5:45. doi: 10.1186/1471-2105-5-45.
4
Signature proteins that are distinctive characteristics of Actinobacteria and their subgroups.作为放线菌及其亚群独特特征的标志性蛋白质。
Antonie Van Leeuwenhoek. 2006 Jul;90(1):69-91. doi: 10.1007/s10482-006-9061-2. Epub 2006 May 3.
5
Molecular detection and phylogenetic analysis of the alkane 1-monooxygenase gene from Gordonia spp.从戈登氏菌属中检测到烷 1-单加氧酶基因的分子检测和系统发育分析
Syst Appl Microbiol. 2010 Mar;33(2):53-9. doi: 10.1016/j.syapm.2009.11.003. Epub 2010 Jan 4.
6
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]
Yi Chuan Xue Bao. 2004 May;31(5):431-43.
7
Identification of co-regulated genes through Bayesian clustering of predicted regulatory binding sites.通过预测调控结合位点的贝叶斯聚类来鉴定共调控基因。
Nat Biotechnol. 2003 Apr;21(4):435-9. doi: 10.1038/nbt802. Epub 2003 Mar 10.
8
Revealing remote protein homology with sequence similarity and a modularity-based approach.通过序列相似性和基于模块性的方法揭示远程蛋白质同源性。
Theor Biol Forum. 2011;104(1):57-68.
9
Spectral clustering of protein sequences.蛋白质序列的谱聚类
Nucleic Acids Res. 2006 Mar 17;34(5):1571-80. doi: 10.1093/nar/gkj515. Print 2006.
10
Efficient functional clustering of protein sequences using the Dirichlet process.使用狄利克雷过程对蛋白质序列进行高效功能聚类。
Bioinformatics. 2008 Aug 15;24(16):1765-71. doi: 10.1093/bioinformatics/btn244. Epub 2008 May 29.

引用本文的文献

1
Transcriptome profile of Corynebacterium pseudotuberculosis in response to iron limitation.结核棒状杆菌响应铁限制的转录组特征。
BMC Genomics. 2019 Aug 20;20(1):663. doi: 10.1186/s12864-019-6018-1.
2
Guiding biomedical clustering with ClustEval.用 ClustEval 指导生物医学聚类。
Nat Protoc. 2018 Jun;13(6):1429-1444. doi: 10.1038/nprot.2018.038. Epub 2018 May 24.
3
The Druggable Pocketome of : A New Approach for Putative Druggable Targets.[具体研究对象]的可药物化口袋组学:一种寻找潜在可药物靶点的新方法 。(需注意原文中“:”前缺失具体研究对象相关内容)
Front Genet. 2018 Feb 13;9:44. doi: 10.3389/fgene.2018.00044. eCollection 2018.
4
LifeStyle-Specific-Islands (LiSSI): Integrated Bioinformatics Platform for Genomic Island Analysis.特定生活方式岛(LiSSI):用于基因组岛分析的综合生物信息学平台。
J Integr Bioinform. 2017 Jul 5;14(2):20170010. doi: 10.1515/jib-2017-0010.
5
PaPrBaG: A machine learning approach for the detection of novel pathogens from NGS data.PaPrBaG:一种从 NGS 数据中检测新型病原体的机器学习方法。
Sci Rep. 2017 Jan 4;7:39194. doi: 10.1038/srep39194.
6
Comparing the performance of biomedical clustering methods.比较生物医学聚类方法的性能。
Nat Methods. 2015 Nov;12(11):1033-8. doi: 10.1038/nmeth.3583. Epub 2015 Sep 21.
7
Comparative analysis of essential genes in prokaryotic genomic islands.原核生物基因组岛中必需基因的比较分析
Sci Rep. 2015 Jul 30;5:12561. doi: 10.1038/srep12561.
8
CMRegNet-An interspecies reference database for corynebacterial and mycobacterial regulatory networks.CMRegNet——用于棒状杆菌和分枝杆菌调控网络的种间参考数据库。
BMC Genomics. 2015 Jun 11;16(1):452. doi: 10.1186/s12864-015-1631-0.
9
NRfamPred: a proteome-scale two level method for prediction of nuclear receptor proteins and their sub-families.NRfamPred:一种用于预测核受体蛋白及其亚家族的蛋白质组规模的两级方法。
Sci Rep. 2014 Oct 29;4:6810. doi: 10.1038/srep06810.