• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用 Profile HMM 进行转座子鉴定。

Transposon identification using profile HMMs.

机构信息

Department of Statistics, Harvard University, Cambridge, MA, USA.

出版信息

BMC Genomics. 2010 Feb 10;11 Suppl 1(Suppl 1):S10. doi: 10.1186/1471-2164-11-S1-S10.

DOI:10.1186/1471-2164-11-S1-S10
PMID:20158867
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2822524/
Abstract

BACKGROUND

Transposons are "jumping genes" that account for large quantities of repetitive content in genomes. They are known to affect transcriptional regulation in several different ways, and are implicated in many human diseases. Transposons are related to microRNAs and viruses, and many genes, pseudogenes, and gene promoters are derived from transposons or have origins in transposon-induced duplication. Modeling transposon-derived genomic content is difficult because they are poorly conserved. Profile hidden Markov models (profile HMMs), widely used for protein sequence family modeling, are rarely used for modeling DNA sequence families. The algorithm commonly used to estimate the parameters of profile HMMs, Baum-Welch, is prone to prematurely converge to local optima. The DNA domain is especially problematic for the Baum-Welch algorithm, since it has only four letters as opposed to the twenty residues of the amino acid alphabet.

RESULTS

We demonstrate with a simulation study and with an application to modeling the MIR family of transposons that two recently introduced methods, Conditional Baum-Welch and Dynamic Model Surgery, achieve better estimates of the parameters of profile HMMs across a range of conditions.

CONCLUSIONS

We argue that these new algorithms expand the range of potential applications of profile HMMs to many important DNA sequence family modeling problems, including that of searching for and modeling the virus-like transposons that are found in all known genomes.

摘要

背景

转座子是“跳跃基因”,它们在基因组中占据了大量的重复内容。已知它们以多种不同的方式影响转录调控,并与许多人类疾病有关。转座子与 microRNAs 和病毒有关,许多基因、假基因和基因启动子都来自转座子或由转座子诱导的复制产生。由于转座子的保守性较差,因此对其衍生的基因组内容进行建模是很困难的。广泛用于蛋白质序列家族建模的轮廓隐马尔可夫模型(profile HMM)很少用于 DNA 序列家族建模。用于估计 profile HMM 参数的常用算法,即 Baum-Welch 算法,容易过早地收敛到局部最优解。DNA 区域对 Baum-Welch 算法来说尤其成问题,因为它只有四个字母,而不是氨基酸字母表的二十个残基。

结果

我们通过模拟研究和对 MIR 家族转座子的建模应用表明,最近引入的两种方法,条件 Baum-Welch 和动态模型手术,在一系列条件下可以更好地估计 profile HMM 参数。

结论

我们认为这些新算法扩展了 profile HMM 在许多重要的 DNA 序列家族建模问题中的潜在应用范围,包括搜索和建模所有已知基因组中存在的类似病毒的转座子。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d279/2822524/9f77494161a9/1471-2164-11-S1-S10-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d279/2822524/dd411ad139de/1471-2164-11-S1-S10-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d279/2822524/0e4f35cca253/1471-2164-11-S1-S10-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d279/2822524/2c0f8003c508/1471-2164-11-S1-S10-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d279/2822524/8a1636b0de17/1471-2164-11-S1-S10-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d279/2822524/152b569af4b9/1471-2164-11-S1-S10-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d279/2822524/bb9fdd3dc1b3/1471-2164-11-S1-S10-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d279/2822524/02007939e367/1471-2164-11-S1-S10-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d279/2822524/9f77494161a9/1471-2164-11-S1-S10-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d279/2822524/dd411ad139de/1471-2164-11-S1-S10-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d279/2822524/0e4f35cca253/1471-2164-11-S1-S10-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d279/2822524/2c0f8003c508/1471-2164-11-S1-S10-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d279/2822524/8a1636b0de17/1471-2164-11-S1-S10-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d279/2822524/152b569af4b9/1471-2164-11-S1-S10-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d279/2822524/bb9fdd3dc1b3/1471-2164-11-S1-S10-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d279/2822524/02007939e367/1471-2164-11-S1-S10-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d279/2822524/9f77494161a9/1471-2164-11-S1-S10-8.jpg

相似文献

1
Transposon identification using profile HMMs.使用 Profile HMM 进行转座子鉴定。
BMC Genomics. 2010 Feb 10;11 Suppl 1(Suppl 1):S10. doi: 10.1186/1471-2164-11-S1-S10.
2
Transposable element annotation of the rice genome.水稻基因组的转座元件注释
Bioinformatics. 2004 Jan 22;20(2):155-60. doi: 10.1093/bioinformatics/bth019.
3
Improved Hidden Markov Model training for multiple sequence alignment by a particle swarm optimization-evolutionary algorithm hybrid.通过粒子群优化-进化算法混合改进用于多序列比对的隐马尔可夫模型训练
Biosystems. 2003 Nov;72(1-2):5-17. doi: 10.1016/s0303-2647(03)00131-x.
4
Implementing EM and Viterbi algorithms for Hidden Markov Model in linear memory.在线性内存中实现隐马尔可夫模型的期望最大化(EM)算法和维特比(Viterbi)算法。
BMC Bioinformatics. 2008 Apr 30;9:224. doi: 10.1186/1471-2105-9-224.
5
Training HMM structure with genetic algorithm for biological sequence analysis.使用遗传算法训练隐马尔可夫模型结构用于生物序列分析。
Bioinformatics. 2004 Dec 12;20(18):3613-9. doi: 10.1093/bioinformatics/bth454. Epub 2004 Aug 5.
6
Using guide trees to construct multiple-sequence evolutionary HMMs.使用引导树构建多序列进化隐马尔可夫模型。
Bioinformatics. 2003;19 Suppl 1:i147-57. doi: 10.1093/bioinformatics/btg1019.
7
Hidden Markov models incorporating fuzzy measures and integrals for protein sequence identification and alignment.结合模糊测度与积分的隐马尔可夫模型用于蛋白质序列识别与比对
Genomics Proteomics Bioinformatics. 2008 Jun;6(2):98-110. doi: 10.1016/S1672-0229(08)60025-X.
8
A clustering approach for estimating parameters of a profile hidden Markov model.一种用于估计轮廓隐马尔可夫模型参数的聚类方法。
Int J Data Min Bioinform. 2013;8(1):66-82. doi: 10.1504/ijdmb.2013.054696.
9
Extraction of hidden Markov model representations of signal patterns in DNA sequences.提取DNA序列中信号模式的隐马尔可夫模型表示。
Pac Symp Biocomput. 1996:686-96.
10
A linear memory algorithm for Baum-Welch training.一种用于Baum-Welch训练的线性记忆算法。
BMC Bioinformatics. 2005 Sep 19;6:231. doi: 10.1186/1471-2105-6-231.

引用本文的文献

1
Mining metagenomes reveals diverse antibiotic biosynthetic genes in uncultured microbial communities.从宏基因组中挖掘出未培养微生物群落中多样化的抗生素生物合成基因。
Braz J Microbiol. 2023 Jun;54(2):983-995. doi: 10.1007/s42770-023-00953-z. Epub 2023 Mar 28.
2
A beginner's guide to manual curation of transposable elements.转座元件人工筛选入门指南。
Mob DNA. 2022 Mar 30;13(1):7. doi: 10.1186/s13100-021-00259-7.

本文引用的文献

1
The genomic basis of disease, mechanisms and assays for genomic disorders.疾病的基因组基础、基因组疾病的机制及检测方法。
Genome Dyn. 2006;1:1-16. doi: 10.1159/000092496.
2
Maize genome in motion.玉米基因组的运动。
Genome Biol. 2008 Apr 7;9(4):303. doi: 10.1186/gb-2008-9-4-303.
3
Alignment uncertainty and genomic analysis.比对不确定性与基因组分析。
Science. 2008 Jan 25;319(5862):473-6. doi: 10.1126/science.1151532.
4
Uncertainty in homology inferences: assessing and improving genomic sequence alignment.同源性推断中的不确定性:评估和改进基因组序列比对
Genome Res. 2008 Feb;18(2):298-309. doi: 10.1101/gr.6725608. Epub 2007 Dec 11.
5
Influence of the transposable element neighborhood on human gene expression in normal and tumor tissues.转座元件邻域对正常组织和肿瘤组织中人类基因表达的影响。
Gene. 2007 Jul 15;396(2):303-11. doi: 10.1016/j.gene.2007.04.002. Epub 2007 Apr 6.
6
Origin and evolution of human microRNAs from transposable elements.人类微小RNA源自转座元件的起源与进化
Genetics. 2007 Jun;176(2):1323-37. doi: 10.1534/genetics.107.072553. Epub 2007 Apr 15.
7
PiggyBac transposon-mediated gene transfer in human cells.PiggyBac转座子介导的人类细胞基因转移。
Mol Ther. 2007 Jan;15(1):139-45. doi: 10.1038/sj.mt.6300028.
8
A third approach to gene prediction suggests thousands of additional human transcribed regions.第三种基因预测方法表明,人类转录区域还有数千个。
PLoS Comput Biol. 2006 Mar;2(3):e18. doi: 10.1371/journal.pcbi.0020018. Epub 2006 Mar 17.
9
Repbase Update, a database of eukaryotic repetitive elements.Repbase Update,一个真核生物重复元件数据库。
Cytogenet Genome Res. 2005;110(1-4):462-7. doi: 10.1159/000084979.
10
MAFFT version 5: improvement in accuracy of multiple sequence alignment.MAFFT 5 版本:多重序列比对准确性的提升。
Nucleic Acids Res. 2005 Jan 20;33(2):511-8. doi: 10.1093/nar/gki198. Print 2005.