• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

应用于全细菌基因组的概率潜在语义分析可识别常见的基因组特征。

Probabilistic latent semantic analysis applied to whole bacterial genomes identifies common genomic features.

作者信息

Rusakovica Julija, Hallinan Jennifer, Wipat Anil, Zuliani Paolo

机构信息

School of Computing Science, and Centre for Synthetic Biology and Bioexploitation, Newcastle University, Newcastle upon Tyne, NE1 7RU, UK.

出版信息

J Integr Bioinform. 2014 Jun 30;11(2):243. doi: 10.2390/biecoll-jib-2014-243.

DOI:10.2390/biecoll-jib-2014-243
PMID:24980693
Abstract

The spread of drug resistance amongst clinically-important bacteria is a serious, and growing, problem [1]. However, the analysis of entire genomes requires considerable computational effort, usually including the assembly of the genome and subsequent identification of genes known to be important in pathology. An alternative approach is to use computational algorithms to identify genomic differences between pathogenic and non-pathogenic bacteria, even without knowing the biological meaning of those differences. To overcome this problem, a range of techniques for dimensionality reduction have been developed. One such approach is known as latent-variable models [2]. In latent-variable models dimensionality reduction is achieved by representing a high-dimensional data by a few hidden or latent variables, which are not directly observed but inferred from the observed variables present in the model. Probabilistic Latent Semantic Indexing (PLSA) is an extention of LSA [3]. PLSA is based on a mixture decomposition derived from a latent class model. The main objective of the algorithm, as in LSA, is to represent high-dimensional co-occurrence information in a lower-dimensional way in order to discover the hidden semantic structure of the data using a probabilistic framework. In this work we applied the PLSA approach to analyse the common genomic features in methicillin resistant Staphylococcus aureus, using tokens derived from amino acid sequences rather than DNA. We characterised genome-scale amino acid sequences in terms of their components, and then investigated the relationships between genomes and tokens and the phenotypes they generated. As a control we used the non-pathogenic model Gram-positive bacterium Bacillus subtilis.

摘要

临床上重要细菌的耐药性传播是一个严重且日益严重的问题[1]。然而,对整个基因组进行分析需要大量的计算工作,通常包括基因组组装以及随后对已知在病理学中重要的基因进行鉴定。另一种方法是使用计算算法来识别致病细菌和非致病细菌之间的基因组差异,即使不知道这些差异的生物学意义。为了克服这个问题,已经开发了一系列降维技术。其中一种方法称为潜在变量模型[2]。在潜在变量模型中,降维是通过用几个隐藏或潜在变量来表示高维数据来实现的,这些变量不是直接观察到的,而是从模型中存在的观察变量推断出来的。概率潜在语义索引(PLSA)是LSA的扩展[3]。PLSA基于从潜在类别模型导出的混合分解。与LSA一样,该算法的主要目标是以低维方式表示高维共现信息,以便使用概率框架发现数据的隐藏语义结构。在这项工作中,我们应用PLSA方法分析耐甲氧西林金黄色葡萄球菌的常见基因组特征,使用从氨基酸序列而非DNA衍生的词元。我们根据其组成部分对基因组规模的氨基酸序列进行了表征,然后研究了基因组与词元以及它们产生的表型之间的关系。作为对照,我们使用了非致病模式革兰氏阳性细菌枯草芽孢杆菌。

相似文献

1
Probabilistic latent semantic analysis applied to whole bacterial genomes identifies common genomic features.应用于全细菌基因组的概率潜在语义分析可识别常见的基因组特征。
J Integr Bioinform. 2014 Jun 30;11(2):243. doi: 10.2390/biecoll-jib-2014-243.
2
[The hierarchical clustering analysis of hyperspectral image based on probabilistic latent semantic analysis].基于概率潜在语义分析的高光谱图像层次聚类分析
Guang Pu Xue Yu Guang Pu Fen Xi. 2011 Sep;31(9):2471-5.
3
Modeling semantic aspects for cross-media image indexing.跨媒体图像索引的语义方面建模
IEEE Trans Pattern Anal Mach Intell. 2007 Oct;29(10):1802-17. doi: 10.1109/TPAMI.2007.1097.
4
Large-scale latent semantic analysis.大规模潜在语义分析。
Behav Res Methods. 2011 Jun;43(2):414-23. doi: 10.3758/s13428-010-0050-z.
5
Concise representation of mass spectrometry images by probabilistic latent semantic analysis.通过概率潜在语义分析对质谱图像进行简洁表示。
Anal Chem. 2008 Dec 15;80(24):9649-58. doi: 10.1021/ac801303x.
6
Application of Akaike information criterion assisted probabilistic latent semantic analysis on non-trilinear total synchronous fluorescence spectroscopic data sets: Automatizing fluorescence based multicomponent mixture analysis.Akaike 信息准则辅助概率潜在语义分析在非线性全同步荧光光谱数据集上的应用:基于荧光的全自动多组分混合物分析。
Anal Chim Acta. 2019 Jul 25;1062:60-67. doi: 10.1016/j.aca.2019.03.009. Epub 2019 Mar 9.
7
PSLDoc: Protein subcellular localization prediction based on gapped-dipeptides and probabilistic latent semantic analysis.PSLDoc:基于间隔二肽和概率潜在语义分析的蛋白质亚细胞定位预测
Proteins. 2008 Aug;72(2):693-710. doi: 10.1002/prot.21944.
8
The research on medical image classification algorithm based on PLSA-BOW model.基于概率潜在语义分析-词袋模型的医学图像分类算法研究
Technol Health Care. 2016 Apr 29;24 Suppl 2:S665-74. doi: 10.3233/THC-161194.
9
Probabilistic latent semantic analysis of composite excitation-emission matrix fluorescence spectra of multicomponent system.多组分体系复合激发-发射矩阵荧光光谱的概率潜在语义分析
Spectrochim Acta A Mol Biomol Spectrosc. 2020 Oct 5;239:118518. doi: 10.1016/j.saa.2020.118518. Epub 2020 May 22.
10
Genomic characterization of ribitol teichoic acid synthesis in Staphylococcus aureus: genes, genomic organization and gene duplication.金黄色葡萄球菌中核糖醇磷壁酸合成的基因组特征:基因、基因组组织和基因重复
BMC Genomics. 2006 Apr 5;7:74. doi: 10.1186/1471-2164-7-74.

引用本文的文献

1
Bacterial plasmid-associated and chromosomal proteins have fundamentally different properties in protein interaction networks.细菌质粒相关蛋白和染色体蛋白在蛋白质相互作用网络中具有根本不同的特性。
Sci Rep. 2022 Nov 10;12(1):19203. doi: 10.1038/s41598-022-20809-0.