Suppr超能文献

突破计算障碍:一种基于分治与聚合的Alu插入位点表征方法

Breaking the computational barrier: a divide-conquer and aggregate based approach for Alu insertion site characterisation.

作者信息

Zhang Kun, Fan Wei, Deininger Prescott, Edwards Andrea, Xu Zujia, Zhu Dongxiao

机构信息

Department of Computer Science, Xavier University of Louisiana, New Orleans, Louisiana 70125, USA.

出版信息

Int J Comput Biol Drug Des. 2009;2(4):302-22. doi: 10.1504/IJCBDD.2009.030763. Epub 2009 Jan 4.

Abstract

Insertion site characterisation of Alu elements is an important problem in primate-specific bioinformatics research. Key characteristics of this challenging problem include: data are not in the pre-defined feature vectors for predictive model construction; without any prior knowledge, can we discover the general patterns that could exist and also make biological insights?; how to obtain the compact yet discriminative patterns given a search space of 4(200)? This paper provides an integrated algorithmic framework for fulfilling the above mining tasks. Compared to the benchmark biological study, our results provide a further refined analysis of the patterns involved in Alu insertion. In particular, we acquire a 200nt predictive profile around the primary insertion site which not only contains the widely accepted consensus, but also suggests a longer pattern (T(7)AA[G'A]AATAA. This pattern provides more insight into the favourable sequence variations allowed for preferred binding and cleavage by the L1 ORF2 endonuclease. The proposed method is general enough that can be also applied to other sequence detection problems, such as microRNA target prediction.

摘要

Alu元件插入位点的特征描述是灵长类特异性生物信息学研究中的一个重要问题。这个具有挑战性的问题的关键特征包括:数据不是用于预测模型构建的预定义特征向量;在没有任何先验知识的情况下,我们能否发现可能存在的一般模式并获得生物学见解?;在4(200)的搜索空间下,如何获得紧凑且有区分性的模式?本文提供了一个用于完成上述挖掘任务的综合算法框架。与基准生物学研究相比,我们的结果对Alu插入所涉及的模式提供了进一步细化的分析。特别是,我们在主要插入位点周围获得了一个200nt的预测图谱,它不仅包含了广泛接受的共有序列,还暗示了一个更长的模式(T(7)AA[G'A]AATAA)。这种模式为L1 ORF2核酸内切酶优先结合和切割所允许的有利序列变异提供了更多见解。所提出的方法具有足够的通用性,也可应用于其他序列检测问题,如microRNA靶标预测。

相似文献

1
Breaking the computational barrier: a divide-conquer and aggregate based approach for Alu insertion site characterisation.
Int J Comput Biol Drug Des. 2009;2(4):302-22. doi: 10.1504/IJCBDD.2009.030763. Epub 2009 Jan 4.
2
Tandem insertions of Alu elements.
Cytogenet Genome Res. 2005;108(1-3):58-62. doi: 10.1159/000080802.
3
Why are young and old repetitive elements distributed differently in the human genome?
J Mol Evol. 2005 Mar;60(3):290-6. doi: 10.1007/s00239-004-0020-0.
4
Identification of human-specific AluS elements through comparative genomics.
Gene. 2015 Jan 25;555(2):208-16. doi: 10.1016/j.gene.2014.11.005. Epub 2014 Nov 7.
5
An alternative pathway for Alu retrotransposition suggests a role in DNA double-strand break repair.
Genomics. 2009 Mar;93(3):205-12. doi: 10.1016/j.ygeno.2008.09.016. Epub 2008 Nov 11.
6
Potential gene conversion and source genes for recently integrated Alu elements.
Genome Res. 2000 Oct;10(10):1485-95. doi: 10.1101/gr.152300.
7
Whole genome computational comparative genomics: A fruitful approach for ascertaining Alu insertion polymorphisms.
Gene. 2006 Jan 3;365:11-20. doi: 10.1016/j.gene.2005.09.031. Epub 2006 Jan 10.
8
Effective automated feature construction and selection for classification of biological sequences.
PLoS One. 2014 Jul 17;9(7):e99982. doi: 10.1371/journal.pone.0099982. eCollection 2014.
9
Recently integrated Alu elements and human genomic diversity.
Mol Biol Evol. 2003 Aug;20(8):1349-61. doi: 10.1093/molbev/msg150. Epub 2003 May 30.

引用本文的文献

1
LINE-1 activity as molecular basis for genomic instability associated with light exposure at night.
Mob Genet Elements. 2015 Apr 7;5(3):1-5. doi: 10.1080/2159256X.2015.1037416. eCollection 2015 May-Jun.
4
Genome-wide analysis of mobile genetic element insertion sites.
Nucleic Acids Res. 2011 Sep 1;39(16):6864-78. doi: 10.1093/nar/gkr337. Epub 2011 May 23.
5
Alu distribution and mutation types of cancer genes.
BMC Genomics. 2011 Mar 23;12:157. doi: 10.1186/1471-2164-12-157.

本文引用的文献

1
Target site analysis of RTE1_LA and its AfroSINE partner in the elephant genome.
Gene. 2008 Dec 1;425(1-2):1-8. doi: 10.1016/j.gene.2008.08.013. Epub 2008 Aug 28.
2
Mammalian non-LTR retrotransposons: for better or worse, in sickness and in health.
Genome Res. 2008 Mar;18(3):343-58. doi: 10.1101/gr.5558208. Epub 2008 Feb 6.
3
rMotifGen: random motif generator for DNA and protein sequences.
BMC Bioinformatics. 2007 Aug 7;8:292. doi: 10.1186/1471-2105-8-292.
4
An efficient, versatile and scalable pattern growth approach to mine frequent patterns in unaligned protein sequences.
Bioinformatics. 2007 Mar 15;23(6):687-93. doi: 10.1093/bioinformatics/btl665. Epub 2007 Jan 19.
5
Structure-based prediction of insertion-site preferences of transposons into chromosomes.
Nucleic Acids Res. 2006 May 22;34(9):2803-11. doi: 10.1093/nar/gkl301. Print 2006.
6
A generic motif discovery algorithm for sequential data.
Bioinformatics. 2006 Jan 1;22(1):21-8. doi: 10.1093/bioinformatics/bti745. Epub 2005 Oct 27.
7
Evolutionary diversity and potential recombinogenic role of integration targets of Non-LTR retrotransposons.
Mol Biol Evol. 2005 Oct;22(10):1983-91. doi: 10.1093/molbev/msi188. Epub 2005 Jun 8.
8
WebLogo: a sequence logo generator.
Genome Res. 2004 Jun;14(6):1188-90. doi: 10.1101/gr.849004.
9
Molecular archeology of L1 insertions in the human genome.
Genome Biol. 2002 Sep 19;3(10):research0052. doi: 10.1186/gb-2002-3-10-research0052.
10
Comprehensive Sequence Analyses of 5' Flanking Regions of Primate Alu Elements.
Genome Inform Ser Workshop Genome Inform. 1998;9:41-48.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验