• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

子空间聚类在DNA序列分析中的应用。

Application of Subspace Clustering in DNA Sequence Analysis.

作者信息

Wallace Tim, Sekmen Ali, Wang Xiaofei

机构信息

1 Department of Computer Science, Tennessee State University , Nashville, Tennessee.

2 Department of Biological Sciences, Tennessee State University , Nashville, Tennessee.

出版信息

J Comput Biol. 2015 Oct;22(10):940-52. doi: 10.1089/cmb.2015.0084. Epub 2015 Jul 10.

DOI:10.1089/cmb.2015.0084
PMID:26162018
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4589114/
Abstract

Identification and clustering of orthologous genes plays an important role in developing evolutionary models such as validating convergent and divergent phylogeny and predicting functional proteins in newly sequenced species of unverified nucleotide protein mappings. Here, we introduce an application of subspace clustering as applied to orthologous gene sequences and discuss the initial results. The working hypothesis is based upon the concept that genetic changes between nucleotide sequences coding for proteins among selected species and groups may lie within a union of subspaces for clusters of the orthologous groups. Estimates for the subspace dimensions were computed for a small population sample. A series of experiments was performed to cluster randomly selected sequences. The experimental design allows for both false positives and false negatives, and estimates for the statistical significance are provided. The clustering results are consistent with the main hypothesis. A simple random mutation binary tree model is used to simulate speciation events that show the interdependence of the subspace rank versus time and mutation rates. The simple mutation model is found to be largely consistent with the observed subspace clustering singular value results. Our study indicates that the subspace clustering method may be applied in orthology analysis.

摘要

直系同源基因的识别与聚类在进化模型的构建中发挥着重要作用,例如验证趋同和趋异系统发育以及预测新测序物种中未经证实的核苷酸蛋白质映射中的功能蛋白。在此,我们介绍一种应用于直系同源基因序列的子空间聚类方法,并讨论初步结果。该工作假设基于这样的概念,即所选物种和群体中编码蛋白质的核苷酸序列之间的遗传变化可能存在于直系同源组聚类的子空间并集中。针对一小部分群体样本计算了子空间维度估计值。进行了一系列实验以对随机选择的序列进行聚类。该实验设计允许出现假阳性和假阴性,并提供了统计显著性估计。聚类结果与主要假设一致。使用简单随机突变二叉树模型来模拟物种形成事件,该模型显示了子空间秩与时间和突变率之间的相互依存关系。发现简单突变模型在很大程度上与观察到的子空间聚类奇异值结果一致。我们的研究表明,子空间聚类方法可应用于直系同源分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ae0/4589114/3374bbf19e28/fig-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ae0/4589114/fe1979591f88/fig-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ae0/4589114/2903229cb42d/fig-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ae0/4589114/ac46b52dc239/fig-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ae0/4589114/7559ab036ec3/fig-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ae0/4589114/3374bbf19e28/fig-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ae0/4589114/fe1979591f88/fig-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ae0/4589114/2903229cb42d/fig-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ae0/4589114/ac46b52dc239/fig-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ae0/4589114/7559ab036ec3/fig-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ae0/4589114/3374bbf19e28/fig-5.jpg

相似文献

1
Application of Subspace Clustering in DNA Sequence Analysis.子空间聚类在DNA序列分析中的应用。
J Comput Biol. 2015 Oct;22(10):940-52. doi: 10.1089/cmb.2015.0084. Epub 2015 Jul 10.
2
On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。
Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.
3
Bayesian coestimation of phylogeny and sequence alignment.系统发育与序列比对的贝叶斯联合估计
BMC Bioinformatics. 2005 Apr 1;6:83. doi: 10.1186/1471-2105-6-83.
4
COCO-CL: hierarchical clustering of homology relations based on evolutionary correlations.COCO-CL:基于进化相关性的同源关系层次聚类。
Bioinformatics. 2006 Apr 1;22(7):779-88. doi: 10.1093/bioinformatics/btl009. Epub 2006 Jan 24.
5
Genotyping of single nucleotide polymorphism using model-based clustering.
Bioinformatics. 2004 Mar 22;20(5):718-26. doi: 10.1093/bioinformatics/btg475. Epub 2004 Jan 29.
6
Automatic clustering of orthologs and inparalogs shared by multiple proteomes.多个蛋白质组共有的直系同源基因和旁系同源基因的自动聚类。
Bioinformatics. 2006 Jul 15;22(14):e9-15. doi: 10.1093/bioinformatics/btl213.
7
Subspace Weighting Co-Clustering of Gene Expression Data.基于基因表达数据的子空间加权协同聚类。
IEEE/ACM Trans Comput Biol Bioinform. 2019 Mar-Apr;16(2):352-364. doi: 10.1109/TCBB.2017.2705686. Epub 2017 May 18.
8
Hetero: a program to simulate the evolution of DNA on a four-taxon tree.Hetero:一个用于模拟四分类群树上DNA进化的程序。
Appl Bioinformatics. 2003;2(3):159-63.
9
Assessment of phylogenomic and orthology approaches for phylogenetic inference.用于系统发育推断的系统发育基因组学和直系同源方法评估。
Bioinformatics. 2007 Apr 1;23(7):815-24. doi: 10.1093/bioinformatics/btm015. Epub 2007 Jan 19.
10
PhyloPat: phylogenetic pattern analysis of eukaryotic genes.PhyloPat:真核基因的系统发育模式分析
BMC Bioinformatics. 2006 Sep 1;7:398. doi: 10.1186/1471-2105-7-398.

本文引用的文献

1
MAFFT multiple sequence alignment software version 7: improvements in performance and usability.MAFFT 多序列比对软件版本 7:性能和易用性的改进。
Mol Biol Evol. 2013 Apr;30(4):772-80. doi: 10.1093/molbev/mst010. Epub 2013 Jan 16.
2
Updated clusters of orthologous genes for Archaea: a complex ancestor of the Archaea and the byways of horizontal gene transfer.古菌的更新直系同源基因簇:古菌的复杂祖先和水平基因转移的旁路。
Biol Direct. 2012 Dec 14;7:46. doi: 10.1186/1745-6150-7-46.
3
ANCAC: amino acid, nucleotide, and codon analysis of COGs--a tool for sequence bias analysis in microbial orthologs.
ANCA:COGs 的氨基酸、核苷酸和密码子分析——一种用于微生物同源物序列偏差分析的工具。
BMC Bioinformatics. 2012 Sep 8;13:223. doi: 10.1186/1471-2105-13-223.
4
Consensus embedding: theory, algorithms and application to segmentation and classification of biomedical data.共识嵌入:理论、算法及其在生物医学数据分割和分类中的应用。
BMC Bioinformatics. 2012 Feb 8;13:26. doi: 10.1186/1471-2105-13-26.
5
A roadmap of clustering algorithms: finding a match for a biomedical application.聚类算法路线图:寻找适合生物医学应用的方法。
Brief Bioinform. 2009 May;10(3):297-314. doi: 10.1093/bib/bbn058. Epub 2009 Feb 24.
6
Extension of the COG and arCOG databases by amino acid and nucleotide sequences.通过氨基酸和核苷酸序列扩展COG和arCOG数据库。
BMC Bioinformatics. 2008 Nov 13;9:479. doi: 10.1186/1471-2105-9-479.
7
BAG: a graph theoretic sequence clustering algorithm.BAG:一种基于图论的序列聚类算法。
Int J Data Min Bioinform. 2006;1(2):178-200. doi: 10.1504/ijdmb.2006.010855.
8
Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea.41个古菌基因组的直系同源基因簇及其对古菌进化基因组学的意义。
Biol Direct. 2007 Nov 27;2:33. doi: 10.1186/1745-6150-2-33.
9
Generalized principal component analysis (GPCA).广义主成分分析(GPCA)。
IEEE Trans Pattern Anal Mach Intell. 2005 Dec;27(12):1945-59. doi: 10.1109/TPAMI.2005.244.
10
OrthoMCL: identification of ortholog groups for eukaryotic genomes.OrthoMCL:真核生物基因组直系同源组的鉴定
Genome Res. 2003 Sep;13(9):2178-89. doi: 10.1101/gr.1224503.