• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于属性聚类的序列相似性网络中的社区检测

Community detection in sequence similarity networks based on attribute clustering.

作者信息

Chowdhary Janamejaya, Löffler Frank E, Smith Jeremy C

机构信息

Center for Molecular Biophysics, Oak Ridge National Laboratory, Oak Ridge, Tennessee, United States of America.

University of Tennessee-Oak Ridge National Laboratory, Joint Institute for Biological Sciences and Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, United States of America.

出版信息

PLoS One. 2017 Jul 24;12(7):e0178650. doi: 10.1371/journal.pone.0178650. eCollection 2017.

DOI:10.1371/journal.pone.0178650
PMID:28738060
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5524321/
Abstract

Networks are powerful tools for the presentation and analysis of interactions in multi-component systems. A commonly studied mesoscopic feature of networks is their community structure, which arises from grouping together similar nodes into one community and dissimilar nodes into separate communities. Here, the community structure of protein sequence similarity networks is determined with a new method: Attribute Clustering Dependent Communities (ACDC). Sequence similarity has hitherto typically been quantified by the alignment score or its expectation value. However, pair alignments with the same score or expectation value cannot thus be differentiated. To overcome this deficiency, the method constructs, for pair alignments, an extended alignment metric, the link attribute vector, which includes the score and other alignment characteristics. Rescaling components of the attribute vectors qualitatively identifies a systematic variation of sequence similarity within protein superfamilies. The problem of community detection is then mapped to clustering the link attribute vectors, selection of an optimal subset of links and community structure refinement based on the partition density of the network. ACDC-predicted communities are found to be in good agreement with gold standard sequence databases for which the "ground truth" community structures (or families) are known. ACDC is therefore a community detection method for sequence similarity networks based entirely on pair similarity information. A serial implementation of ACDC is available from https://cmb.ornl.gov/resources/developments.

摘要

网络是用于呈现和分析多组分系统中相互作用的强大工具。网络中一个常被研究的介观特征是其社区结构,它是通过将相似节点归为一个社区,将不相似节点归为不同社区而产生的。在此,蛋白质序列相似性网络的社区结构通过一种新方法确定:属性聚类相关社区(ACDC)。迄今为止,序列相似性通常通过比对分数或其期望值来量化。然而,具有相同分数或期望值的比对对无法因此而区分。为克服这一缺陷,该方法为比对对构建了一种扩展的比对度量,即链接属性向量,它包括分数和其他比对特征。对属性向量的分量进行重新缩放定性地识别了蛋白质超家族内序列相似性的系统变化。然后,社区检测问题被映射为对链接属性向量进行聚类、选择链接的最优子集以及基于网络的划分密度对社区结构进行细化。发现ACDC预测的社区与已知“真实”社区结构(或家族)的黄金标准序列数据库高度一致。因此,ACDC是一种完全基于比对相似性信息的序列相似性网络社区检测方法。ACDC的串行实现可从https://cmb.ornl.gov/resources/developments获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d0a/5524321/268f317e2bbf/pone.0178650.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d0a/5524321/7bfe8ded4d78/pone.0178650.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d0a/5524321/0e9e19efcfad/pone.0178650.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d0a/5524321/e9b34b6af188/pone.0178650.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d0a/5524321/051d8f9987c0/pone.0178650.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d0a/5524321/65ff7ad020a2/pone.0178650.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d0a/5524321/268f317e2bbf/pone.0178650.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d0a/5524321/7bfe8ded4d78/pone.0178650.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d0a/5524321/0e9e19efcfad/pone.0178650.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d0a/5524321/e9b34b6af188/pone.0178650.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d0a/5524321/051d8f9987c0/pone.0178650.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d0a/5524321/65ff7ad020a2/pone.0178650.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d0a/5524321/268f317e2bbf/pone.0178650.g006.jpg

相似文献

1
Community detection in sequence similarity networks based on attribute clustering.基于属性聚类的序列相似性网络中的社区检测
PLoS One. 2017 Jul 24;12(7):e0178650. doi: 10.1371/journal.pone.0178650. eCollection 2017.
2
Cross-over between discrete and continuous protein structure space: insights into automatic classification and networks of protein structures.离散与连续蛋白质结构空间之间的交叉:对蛋白质结构自动分类及网络的见解。
PLoS Comput Biol. 2009 Mar;5(3):e1000331. doi: 10.1371/journal.pcbi.1000331. Epub 2009 Mar 27.
3
Compression-based classification of biological sequences and structures via the Universal Similarity Metric: experimental assessment.通过通用相似性度量对生物序列和结构进行基于压缩的分类:实验评估
BMC Bioinformatics. 2007 Jul 13;8:252. doi: 10.1186/1471-2105-8-252.
4
Accuracy of structure-based sequence alignment of automatic methods.自动方法的基于结构的序列比对准确性。
BMC Bioinformatics. 2007 Sep 20;8:355. doi: 10.1186/1471-2105-8-355.
5
PASS2: an automated database of protein alignments organised as structural superfamilies.PASS2:一个以结构超家族形式组织的蛋白质比对自动化数据库。
BMC Bioinformatics. 2004 Apr 2;5:35. doi: 10.1186/1471-2105-5-35.
6
Scoring alignments by embedding vector similarity.通过嵌入向量相似度对配准进行评分。
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae178.
7
CLUSS: clustering of protein sequences based on a new similarity measure.CLUSS:基于一种新的相似性度量对蛋白质序列进行聚类。
BMC Bioinformatics. 2007 Aug 4;8:286. doi: 10.1186/1471-2105-8-286.
8
Clustering protein sequences with a novel metric transformed from sequence similarity scores and sequence alignments with neural networks.使用从序列相似性得分转换而来的新度量以及神经网络进行的序列比对来对蛋白质序列进行聚类。
BMC Bioinformatics. 2005 Oct 3;6:242. doi: 10.1186/1471-2105-6-242.
9
Quality of alignment comparison by COMPASS improves with inclusion of diverse confident homologs.通过COMPASS进行比对的质量会随着纳入多样的可靠同源物而提高。
Bioinformatics. 2004 Apr 12;20(6):818-28. doi: 10.1093/bioinformatics/btg485. Epub 2004 Jan 29.
10
A similarity network approach for the analysis and comparison of protein sequence/structure sets.相似网络分析方法在蛋白质序列/结构组分析和比较中的应用。
J Biomed Inform. 2010 Apr;43(2):257-67. doi: 10.1016/j.jbi.2010.01.005. Epub 2010 Jan 25.

引用本文的文献

1
On the origin of mitochondria: a multilayer network approach.线粒体的起源:一种多层次网络方法。
PeerJ. 2023 Jan 6;11:e14571. doi: 10.7717/peerj.14571. eCollection 2023.

本文引用的文献

1
A vocabulary of ancient peptides at the origin of folded proteins.折叠蛋白起源处的古代肽词汇表。
Elife. 2015 Dec 14;4:e09410. doi: 10.7554/eLife.09410.
2
Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST): A web tool for generating protein sequence similarity networks.酶功能倡议-酶相似性工具(EFI-EST):一种用于生成蛋白质序列相似性网络的网络工具。
Biochim Biophys Acta. 2015 Aug;1854(8):1019-37. doi: 10.1016/j.bbapap.2015.04.015. Epub 2015 Apr 18.
3
Evaluation and improvements of clustering algorithms for detecting remote homologous protein families.
用于检测远程同源蛋白家族的聚类算法的评估与改进
BMC Bioinformatics. 2015 Feb 5;16:34. doi: 10.1186/s12859-014-0445-4.
4
CATH: comprehensive structural and functional annotations for genome sequences.CATH:基因组序列的全面结构和功能注释。
Nucleic Acids Res. 2015 Jan;43(Database issue):D376-81. doi: 10.1093/nar/gku947. Epub 2014 Oct 27.
5
New insights about enzyme evolution from large scale studies of sequence and structure relationships.从大规模的序列和结构关系研究中获得的关于酶进化的新见解。
J Biol Chem. 2014 Oct 31;289(44):30221-30228. doi: 10.1074/jbc.R114.569350. Epub 2014 Sep 10.
6
Pfam: the protein families database.Pfam:蛋白质家族数据库。
Nucleic Acids Res. 2014 Jan;42(Database issue):D222-30. doi: 10.1093/nar/gkt1223. Epub 2013 Nov 27.
7
A pluralistic account of homology: adapting the models to the data.多元论的同源关系解释:使模型适应数据。
Mol Biol Evol. 2014 Mar;31(3):501-16. doi: 10.1093/molbev/mst228. Epub 2013 Nov 22.
8
The Structure-Function Linkage Database.结构-功能链接数据库。
Nucleic Acids Res. 2014 Jan;42(Database issue):D521-30. doi: 10.1093/nar/gkt1130. Epub 2013 Nov 23.
9
kClust: fast and sensitive clustering of large protein sequence databases.kClust:快速且灵敏的大规模蛋白质序列数据库聚类程序。
BMC Bioinformatics. 2013 Aug 15;14:248. doi: 10.1186/1471-2105-14-248.
10
EGN: a wizard for construction of gene and genome similarity networks.EGN:用于构建基因和基因组相似性网络的工具。
BMC Evol Biol. 2013 Jul 11;13:146. doi: 10.1186/1471-2148-13-146.