• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从带注释的基因序列中计算推断出大于基因结构的语法。

Computational inference of grammars for larger-than-gene structures from annotated gene sequences.

机构信息

Australian Institute of Health Innovation, University of New South Wales, Australia.

出版信息

Bioinformatics. 2011 Mar 15;27(6):791-6. doi: 10.1093/bioinformatics/btr036. Epub 2011 Jan 22.

DOI:10.1093/bioinformatics/btr036
PMID:21258064
Abstract

MOTIVATION

Larger than gene structures (LGS) are DNA segments that include at least one gene and often other segments such as inverted repeats and gene promoters. Mobile genetic elements (MGE) such as integrons are LGS that play an important role in horizontal gene transfer, primarily in Gram-negative organisms. Known LGS have a profound effect on organism virulence, antibiotic resistance and other properties of the organism due to the number of genes involved. Expert-compiled grammars have been shown to be an effective computational representation of LGS, well suited to automating annotation, and supporting de novo gene discovery. However, development of LGS grammars by experts is labour intensive and restricted to known LGS.

OBJECTIVES

This study uses computational grammar inference methods to automate LGS discovery. We compare the ability of six algorithms to infer LGS grammars from DNA sequences annotated with genes and other short sequences. We compared the predictive power of learned grammars against an expert-developed grammar for gene cassette arrays found in Class 1, 2 and 3 integrons, which are modular LGS containing up to 9 of about 240 cassette types.

RESULTS

Using a Bayesian generalization algorithm our inferred grammar was able to predict > 95% of MGE structures in a corpus of 1760 sequences obtained from Genbank (F-score 75%). Even with 100% noise added to the training and test sets, we obtained an F-score of 68%, indicating that the method is robust and has the potential to predict de novo LGS structures when the underlying gene features are known.

AVAILABILITY

http://www2.chi.unsw.edu.au/attacca.

摘要

动机

大于基因结构(LGS)的是包含至少一个基因的 DNA 片段,通常还包括其他片段,如反向重复和基因启动子。整合子等移动遗传元件(MGE)是 LGS,它们在水平基因转移中起着重要作用,主要在革兰氏阴性生物中。由于涉及的基因数量众多,已知的 LGS 对生物体的毒力、抗生素耐药性和其他特性有深远的影响。专家编制的语法被证明是 LGS 的有效计算表示,非常适合于自动化注释,并支持从头发现基因。然而,专家开发 LGS 语法需要大量的劳动,并且仅限于已知的 LGS。

目的

本研究使用计算语法推断方法来自动发现 LGS。我们比较了六种算法从基因和其他短序列注释的 DNA 序列中推断 LGS 语法的能力。我们比较了学习语法的预测能力与专家开发的用于 Class 1、2 和 3 整合子中基因盒阵列的语法,整合子是含有多达 9 个约 240 种盒式类型的模块化 LGS。

结果

使用贝叶斯泛化算法,我们推断的语法能够预测来自 Genbank 的 1760 个序列语料库中超过 95%的 MGE 结构(F 分数为 75%)。即使在训练集和测试集上添加了 100%的噪声,我们仍然获得了 68%的 F 分数,这表明该方法是稳健的,并且当已知潜在的基因特征时,有可能预测新的 LGS 结构。

可用性

http://www2.chi.unsw.edu.au/attacca.

相似文献

1
Computational inference of grammars for larger-than-gene structures from annotated gene sequences.从带注释的基因序列中计算推断出大于基因结构的语法。
Bioinformatics. 2011 Mar 15;27(6):791-6. doi: 10.1093/bioinformatics/btr036. Epub 2011 Jan 22.
2
RAC: Repository of Antibiotic resistance Cassettes.RAC:抗生素耐药性基因盒库。
Database (Oxford). 2011 Dec 2;2011:bar054. doi: 10.1093/database/bar054. Print 2011.
3
Context-driven discovery of gene cassettes in mobile integrons using a computational grammar.基于计算语法的移动整合子中基因盒的语境驱动发现。
BMC Bioinformatics. 2009 Sep 8;10:281. doi: 10.1186/1471-2105-10-281.
4
Automated annotation of mobile antibiotic resistance in Gram-negative bacteria: the Multiple Antibiotic Resistance Annotator (MARA) and database.自动化注释革兰氏阴性菌中的移动抗生素耐药性:多抗生素耐药性注释器(MARA)和数据库。
J Antimicrob Chemother. 2018 Apr 1;73(4):883-890. doi: 10.1093/jac/dkx513.
5
Modeling promoter grammars with evolving hidden Markov models.使用进化隐马尔可夫模型对启动子语法进行建模。
Bioinformatics. 2008 Aug 1;24(15):1669-75. doi: 10.1093/bioinformatics/btn254. Epub 2008 Jun 5.
6
An MCMC algorithm for detecting short adjacent repeats shared by multiple sequences.一种用于检测多个序列共享的短相邻重复的 MCMC 算法。
Bioinformatics. 2011 Jul 1;27(13):1772-9. doi: 10.1093/bioinformatics/btr287. Epub 2011 May 6.
7
Compression of annotated nucleotide sequences.带注释核苷酸序列的压缩
IEEE/ACM Trans Comput Biol Bioinform. 2007 Jul-Sep;4(3):447-457. doi: 10.1109/tcbb.2007.1017.
8
A transdimensional Bayesian model for pattern recognition in DNA sequences.一种用于DNA序列模式识别的跨维度贝叶斯模型。
Biostatistics. 2008 Oct;9(4):668-85. doi: 10.1093/biostatistics/kxm058. Epub 2008 Mar 18.
9
Basic Gene Grammars and DNA-ChartParser for language processing of Escherichia coli promoter DNA sequences.用于大肠杆菌启动子DNA序列语言处理的基本基因语法和DNA图表解析器
Bioinformatics. 2001 Mar;17(3):226-36. doi: 10.1093/bioinformatics/17.3.226.
10
HattCI: Fast and Accurate attC site Identification Using Hidden Markov Models.HattCI:使用隐马尔可夫模型快速准确地识别attC位点
J Comput Biol. 2016 Nov;23(11):891-902. doi: 10.1089/cmb.2016.0024. Epub 2016 Jul 18.

引用本文的文献

1
A grammar inference approach for predicting kinase specific phosphorylation sites.一种用于预测激酶特异性磷酸化位点的语法推理方法。
PLoS One. 2015 Apr 17;10(4):e0122294. doi: 10.1371/journal.pone.0122294. eCollection 2015.
2
A composite method based on formal grammar and DNA structural features in detecting human polymerase II promoter region.基于形式语法和 DNA 结构特征的复合方法检测人类聚合酶 II 启动子区域。
PLoS One. 2013;8(2):e54843. doi: 10.1371/journal.pone.0054843. Epub 2013 Feb 20.
3
RAC: Repository of Antibiotic resistance Cassettes.
RAC:抗生素耐药性基因盒库。
Database (Oxford). 2011 Dec 2;2011:bar054. doi: 10.1093/database/bar054. Print 2011.