Suppr超能文献

基于基因集的基因组序列分析中长度变异的 LOESS 校正。

LOESS correction for length variation in gene set-based genomic sequence analysis.

机构信息

Department of Aeronautics and Astronautics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.

出版信息

Bioinformatics. 2012 Jun 1;28(11):1446-54. doi: 10.1093/bioinformatics/bts155. Epub 2012 Apr 5.

Abstract

MOTIVATION

Sequence analysis algorithms are often applied to sets of DNA, RNA or protein sequences to identify common or distinguishing features. Controlling for sequence length variation is critical to properly score sequence features and identify true biological signals rather than length-dependent artifacts.

RESULTS

Several cis-regulatory module discovery algorithms exhibit a substantial dependence between DNA sequence score and sequence length. Our newly developed LOESS method is flexible in capturing diverse score-length relationships and is more effective in correcting DNA sequence scores for length-dependent artifacts, compared with four other approaches. Application of this method to genes co-expressed during Drosophila melanogaster embryonic mesoderm development or neural development scored by the Lever motif analysis algorithm resulted in successful recovery of their biologically validated cis-regulatory codes. The LOESS length-correction method is broadly applicable, and may be useful not only for more accurate inference of cis-regulatory codes, but also for detection of other types of patterns in biological sequences.

AVAILABILITY

Source code and compiled code are available from http://thebrain.bwh.harvard.edu/LM_LOESS/

摘要

动机

序列分析算法通常应用于一组 DNA、RNA 或蛋白质序列,以识别常见或区别特征。控制序列长度变化对于正确评分序列特征和识别真正的生物信号而不是长度相关的伪影至关重要。

结果

几种顺式调控模块发现算法显示 DNA 序列得分与序列长度之间存在显著的相关性。与其他四种方法相比,我们新开发的 LOESS 方法在捕捉不同的得分-长度关系方面更加灵活,并且在纠正 DNA 序列得分的长度相关伪影方面更加有效。将该方法应用于果蝇胚胎中胚层发育或神经发育过程中共同表达的基因,通过 Lever 基序分析算法进行评分,成功地恢复了其经过生物学验证的顺式调控代码。LOESS 长度校正方法具有广泛的适用性,不仅可以更准确地推断顺式调控代码,还可以检测生物序列中的其他类型模式。

可用性

源代码和编译代码可从 http://thebrain.bwh.harvard.edu/LM_LOESS/ 获得。

相似文献

9
Searching for statistically significant regulatory modules.寻找具有统计学意义的调控模块。
Bioinformatics. 2003 Oct;19 Suppl 2:ii16-25. doi: 10.1093/bioinformatics/btg1054.

本文引用的文献

3
Length bias correction for RNA-seq data in gene set analyses.基因集分析中 RNA-seq 数据的长度偏差校正。
Bioinformatics. 2011 Mar 1;27(5):662-9. doi: 10.1093/bioinformatics/btr005. Epub 2011 Jan 19.
4
Assessing computational methods of cis-regulatory module prediction.评估顺式调控模块预测的计算方法。
PLoS Comput Biol. 2010 Dec 2;6(12):e1001020. doi: 10.1371/journal.pcbi.1001020.
6
De-correlating expression in gene-set analysis.基因集分析中的去相关表达。
Bioinformatics. 2010 Sep 15;26(18):i511-6. doi: 10.1093/bioinformatics/btq380.
7
TransFind--predicting transcriptional regulators for gene sets.TransFind——用于预测基因集转录调控因子的工具。
Nucleic Acids Res. 2010 Jul;38(Web Server issue):W275-80. doi: 10.1093/nar/gkq438. Epub 2010 May 28.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验