• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于邻域聚类核的谱聚类对称性搜索远程同源性。

Searching remote homology with spectral clustering with symmetry in neighborhood cluster kernels.

机构信息

Department of Computer Science and Engineering, Jadavpur University, Kolkata, West Bengal, India.

出版信息

PLoS One. 2013;8(2):e46468. doi: 10.1371/journal.pone.0046468. Epub 2013 Feb 15.

DOI:10.1371/journal.pone.0046468
PMID:23457439
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3574063/
Abstract

UNLABELLED

Remote homology detection among proteins utilizing only the unlabelled sequences is a central problem in comparative genomics. The existing cluster kernel methods based on neighborhoods and profiles and the Markov clustering algorithms are currently the most popular methods for protein family recognition. The deviation from random walks with inflation or dependency on hard threshold in similarity measure in those methods requires an enhancement for homology detection among multi-domain proteins. We propose to combine spectral clustering with neighborhood kernels in Markov similarity for enhancing sensitivity in detecting homology independent of "recent" paralogs. The spectral clustering approach with new combined local alignment kernels more effectively exploits the unsupervised protein sequences globally reducing inter-cluster walks. When combined with the corrections based on modified symmetry based proximity norm deemphasizing outliers, the technique proposed in this article outperforms other state-of-the-art cluster kernels among all twelve implemented kernels. The comparison with the state-of-the-art string and mismatch kernels also show the superior performance scores provided by the proposed kernels. Similar performance improvement also is found over an existing large dataset. Therefore the proposed spectral clustering framework over combined local alignment kernels with modified symmetry based correction achieves superior performance for unsupervised remote homolog detection even in multi-domain and promiscuous domain proteins from Genolevures database families with better biological relevance. Source code available upon request.

CONTACT

sarkar@labri.fr.

摘要

未标记

利用未标记的序列仅对蛋白质进行远程同源检测是比较基因组学中的一个核心问题。基于邻域和轮廓的现有聚类核方法和马尔可夫聚类算法是目前用于蛋白质家族识别的最流行的方法。这些方法中的随机游走偏离膨胀或相似性度量中对硬阈值的依赖,需要增强多域蛋白质之间的同源检测。我们建议将谱聚类与马尔可夫相似性中的邻域核结合起来,以提高检测与“最近”旁系同源物无关的同源性的敏感性。具有新组合局部对齐核的谱聚类方法更有效地利用了无监督的蛋白质序列全局,减少了簇间的游走。当与基于修改后的对称近邻规范的校正相结合时,该方法可以减少异常值的影响,该技术在所有 12 个实现的核中优于其他最先进的聚类核。与最先进的字符串和错配核的比较也显示了所提出的核提供的优越性能得分。在现有大型数据集上也发现了类似的性能改进。因此,提出的基于谱聚类框架的组合局部对齐核与基于修改的对称校正相结合,即使在来自 Genolevures 数据库家族的多域和混杂域蛋白质中,也能实现更好的生物学相关性的无监督远程同源检测的优越性能。如有需要,请提供源代码。

联系方式

sarkar@labri.fr。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50e3/3574063/c638e16ca621/pone.0046468.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50e3/3574063/e5f6b9eb132e/pone.0046468.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50e3/3574063/db810e34ca96/pone.0046468.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50e3/3574063/e17bc8f7a9a9/pone.0046468.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50e3/3574063/387818382545/pone.0046468.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50e3/3574063/c638e16ca621/pone.0046468.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50e3/3574063/e5f6b9eb132e/pone.0046468.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50e3/3574063/db810e34ca96/pone.0046468.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50e3/3574063/e17bc8f7a9a9/pone.0046468.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50e3/3574063/387818382545/pone.0046468.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50e3/3574063/c638e16ca621/pone.0046468.g005.jpg

相似文献

1
Searching remote homology with spectral clustering with symmetry in neighborhood cluster kernels.基于邻域聚类核的谱聚类对称性搜索远程同源性。
PLoS One. 2013;8(2):e46468. doi: 10.1371/journal.pone.0046468. Epub 2013 Feb 15.
2
Protein homology detection using string alignment kernels.使用字符串比对核进行蛋白质同源性检测。
Bioinformatics. 2004 Jul 22;20(11):1682-9. doi: 10.1093/bioinformatics/bth141. Epub 2004 Feb 26.
3
Profile-based string kernels for remote homology detection and motif extraction.基于轮廓的字符串核用于远程同源性检测和基序提取。
J Bioinform Comput Biol. 2005 Jun;3(3):527-50. doi: 10.1142/s021972000500120x.
4
Mismatch string kernels for discriminative protein classification.用于判别式蛋白质分类的错配字符串核
Bioinformatics. 2004 Mar 1;20(4):467-76. doi: 10.1093/bioinformatics/btg431. Epub 2004 Jan 22.
5
Profile-based direct kernels for remote homology detection and fold recognition.用于远程同源性检测和折叠识别的基于轮廓的直接内核。
Bioinformatics. 2005 Dec 1;21(23):4239-47. doi: 10.1093/bioinformatics/bti687. Epub 2005 Sep 27.
6
Profile-based string kernels for remote homology detection and motif extraction.基于轮廓的字符串核用于远程同源性检测和基序提取。
Proc IEEE Comput Syst Bioinform Conf. 2004:152-60. doi: 10.1109/csb.2004.1332428.
7
Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection.概率多类多核学习:用于蛋白质折叠识别和远程同源性检测
Bioinformatics. 2008 May 15;24(10):1264-70. doi: 10.1093/bioinformatics/btn112. Epub 2008 Mar 31.
8
ProClust: improved clustering of protein sequences with an extended graph-based approach.ProClust:基于扩展的图形方法改进蛋白质序列聚类
Bioinformatics. 2002;18 Suppl 2:S182-91. doi: 10.1093/bioinformatics/18.suppl_2.s182.
9
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.支持向量机折叠法:一种用于判别式多类别蛋白质折叠和超家族识别的工具。
BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.
10
Large scale clustering of protein sequences with FORCE -A layout based heuristic for weighted cluster editing.基于FORCE -A布局启发式算法的蛋白质序列大规模聚类用于加权聚类编辑。
BMC Bioinformatics. 2007 Oct 17;8:396. doi: 10.1186/1471-2105-8-396.

引用本文的文献

1
Using hierarchical cluster models to systematically identify groups of jobs with similar occupational questionnaire response patterns to assist rule-based expert exposure assessment in population-based studies.使用层次聚类模型系统地识别具有相似职业问卷回答模式的工作群组,以协助基于人群的研究中基于规则的专家暴露评估。
Ann Occup Hyg. 2015 May;59(4):455-66. doi: 10.1093/annhyg/meu101. Epub 2014 Dec 3.

本文引用的文献

1
A cluster separation measure.一种聚类分离度量。
IEEE Trans Pattern Anal Mach Intell. 1979 Feb;1(2):224-7.
2
Large-scale prediction of protein-protein interactions from structures.从结构大规模预测蛋白质-蛋白质相互作用。
BMC Bioinformatics. 2010 Mar 18;11:144. doi: 10.1186/1471-2105-11-144.
3
Remote homology detection using a kernel method that combines sequence and secondary-structure similarity scores.使用一种结合了序列和二级结构相似性得分的核方法进行远程同源性检测。
In Silico Biol. 2009;9(3):89-103.
4
Génolevures: protein families and synteny among complete hemiascomycetous yeast proteomes and genomes.酵母基因组:半子囊菌酵母全蛋白质组和基因组中的蛋白质家族与共线性
Nucleic Acids Res. 2009 Jan;37(Database issue):D550-4. doi: 10.1093/nar/gkn859. Epub 2008 Nov 16.
5
Family relationships: should consensus reign?--consensus clustering for protein families.家族关系:是否应达成共识?——蛋白质家族的共识聚类
Bioinformatics. 2007 Jan 15;23(2):e71-6. doi: 10.1093/bioinformatics/btl314.
6
Spectral clustering of protein sequences.蛋白质序列的谱聚类
Nucleic Acids Res. 2006 Mar 17;34(5):1571-80. doi: 10.1093/nar/gkj515. Print 2006.
7
Genolevures complete genomes provide data and tools for comparative genomics of hemiascomycetous yeasts.酵母全基因组为半子囊菌酵母的比较基因组学提供数据和工具。
Nucleic Acids Res. 2006 Jan 1;34(Database issue):D432-5. doi: 10.1093/nar/gkj160.
8
Profile-based direct kernels for remote homology detection and fold recognition.用于远程同源性检测和折叠识别的基于轮廓的直接内核。
Bioinformatics. 2005 Dec 1;21(23):4239-47. doi: 10.1093/bioinformatics/bti687. Epub 2005 Sep 27.
9
ROCR: visualizing classifier performance in R.ROCR:在R语言中可视化分类器性能
Bioinformatics. 2005 Oct 15;21(20):3940-1. doi: 10.1093/bioinformatics/bti623. Epub 2005 Aug 11.
10
Semi-supervised protein classification using cluster kernels.使用聚类核的半监督蛋白质分类
Bioinformatics. 2005 Aug 1;21(15):3241-7. doi: 10.1093/bioinformatics/bti497. Epub 2005 May 19.