• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

同源基因挖掘器:在全基因组中寻找同源基因组群。

HomologMiner: looking for homologous genomic groups in whole genomes.

作者信息

Hou Minmei, Berman Piotr, Hsu Chih-Hao, Harris Robert S

机构信息

Department of Computer Science & Engineering, Penn State University, PA, USA.

出版信息

Bioinformatics. 2007 Apr 15;23(8):917-25. doi: 10.1093/bioinformatics/btm048. Epub 2007 Feb 18.

DOI:10.1093/bioinformatics/btm048
PMID:17308341
Abstract

MOTIVATION

Complex genomes contain numerous repeated sequences, and genomic duplication is believed to be a main evolutionary mechanism to obtain new functions. Several tools are available for de novo repeat sequence identification, and many approaches exist for clustering homologous protein sequences. We present an efficient new approach to identify and cluster homologous DNA sequences with high accuracy at the level of whole genomes, excluding low-complexity repeats, tandem repeats and annotated interspersed repeats. We also determine the boundaries of each group member so that it closely represents a biological unit, e.g. a complete gene, or a partial gene coding a protein domain.

RESULTS

We developed a program called HomologMiner to identify homologous groups applicable to genome sequences that have been properly marked for low-complexity repeats and annotated interspersed repeats. We applied it to the whole genomes of human (hg17), macaque (rheMac2) and mouse (mm8). Groups obtained include gene families (e.g. olfactory receptor gene family, zinc finger families), unannotated interspersed repeats and additional homologous groups that resulted from recent segmental duplications. Our program incorporates several new methods: a new abstract definition of consistent duplicate units, a new criterion to remove moderately frequent tandem repeats, and new algorithmic techniques. We also provide preliminary analysis of the output on the three genomes mentioned above, and show several applications including identifying boundaries of tandem gene clusters and novel interspersed repeat families.

AVAILABILITY

All programs and datasets are downloadable from www.bx.psu.edu/miller_lab.

摘要

动机

复杂基因组包含大量重复序列,基因组复制被认为是获得新功能的主要进化机制。有多种工具可用于从头重复序列识别,也存在许多用于聚类同源蛋白质序列的方法。我们提出了一种高效的新方法,可在全基因组水平上高精度地识别和聚类同源DNA序列,排除低复杂度重复序列、串联重复序列和已注释的散布重复序列。我们还确定每个组成员的边界,以便其紧密代表一个生物学单元,例如一个完整基因或编码蛋白质结构域的部分基因。

结果

我们开发了一个名为HomologMiner的程序,用于识别适用于已正确标记低复杂度重复序列和已注释散布重复序列的基因组序列的同源组。我们将其应用于人类(hg17)、猕猴(rheMac2)和小鼠(mm8)的全基因组。获得的组包括基因家族(如嗅觉受体基因家族、锌指家族)、未注释的散布重复序列以及近期片段重复产生的其他同源组。我们的程序纳入了几种新方法:一致重复单元的新抽象定义、去除中度频繁串联重复序列的新标准以及新的算法技术。我们还对上述三个基因组的输出进行了初步分析,并展示了几种应用,包括识别串联基因簇的边界和新的散布重复序列家族。

可用性

所有程序和数据集均可从www.bx.psu.edu/miller_lab下载。

相似文献

1
HomologMiner: looking for homologous genomic groups in whole genomes.同源基因挖掘器:在全基因组中寻找同源基因组群。
Bioinformatics. 2007 Apr 15;23(8):917-25. doi: 10.1093/bioinformatics/btm048. Epub 2007 Feb 18.
2
Tandem repeats over the edit distance.编辑距离上的串联重复序列。
Bioinformatics. 2007 Jan 15;23(2):e30-5. doi: 10.1093/bioinformatics/btl309.
3
CGAT: a comparative genome analysis tool for visualizing alignments in the analysis of complex evolutionary changes between closely related genomes.CGAT:一种用于在分析密切相关基因组之间复杂进化变化时可视化比对结果的比较基因组分析工具。
BMC Bioinformatics. 2006 Oct 24;7:472. doi: 10.1186/1471-2105-7-472.
4
Indel seeds for homology search.用于同源性搜索的插入缺失种子。
Bioinformatics. 2006 Jul 15;22(14):e341-9. doi: 10.1093/bioinformatics/btl263.
5
RBR: library-less repeat detection for ESTs.RBR:用于ESTs的无文库重复序列检测
Bioinformatics. 2006 Sep 15;22(18):2232-6. doi: 10.1093/bioinformatics/btl368. Epub 2006 Jul 12.
6
WindowMasker: window-based masker for sequenced genomes.窗口掩码器:用于测序基因组的基于窗口的掩码器。
Bioinformatics. 2006 Jan 15;22(2):134-41. doi: 10.1093/bioinformatics/bti774. Epub 2005 Nov 15.
7
Gene function prediction based on genomic context clustering and discriminative learning: an application to bacteriophages.基于基因组上下文聚类和判别学习的基因功能预测:在噬菌体中的应用
BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S6. doi: 10.1186/1471-2105-8-S4-S6.
8
AIMIE: a web-based environment for detection and interpretation of significant sequence motifs in prokaryotic genomes.AIMIE:一个基于网络的用于检测和解释原核生物基因组中重要序列基序的环境。
Bioinformatics. 2008 Apr 15;24(8):1041-8. doi: 10.1093/bioinformatics/btn077. Epub 2008 Feb 26.
9
GenoMiner: a tool for genome-wide search of coding and non-coding conserved sequence tags.基因挖掘器:一种用于全基因组搜索编码和非编码保守序列标签的工具。
Bioinformatics. 2006 Feb 15;22(4):497-9. doi: 10.1093/bioinformatics/bti754. Epub 2005 Nov 2.
10
CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes.CEGMA:一种用于准确注释真核生物基因组中核心基因的流程。
Bioinformatics. 2007 May 1;23(9):1061-7. doi: 10.1093/bioinformatics/btm071. Epub 2007 Mar 1.

引用本文的文献

1
Identification of both copy number variation-type and constant-type core elements in a large segmental duplication region of the mouse genome.鉴定小鼠基因组大片段重复区域中的拷贝数变异型和常数型核心元件。
BMC Genomics. 2013 Jul 8;14:455. doi: 10.1186/1471-2164-14-455.