• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于基因集的基因组序列分析中长度变异的 LOESS 校正。

LOESS correction for length variation in gene set-based genomic sequence analysis.

机构信息

Department of Aeronautics and Astronautics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.

出版信息

Bioinformatics. 2012 Jun 1;28(11):1446-54. doi: 10.1093/bioinformatics/bts155. Epub 2012 Apr 5.

DOI:10.1093/bioinformatics/bts155
PMID:22492312
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3356840/
Abstract

MOTIVATION

Sequence analysis algorithms are often applied to sets of DNA, RNA or protein sequences to identify common or distinguishing features. Controlling for sequence length variation is critical to properly score sequence features and identify true biological signals rather than length-dependent artifacts.

RESULTS

Several cis-regulatory module discovery algorithms exhibit a substantial dependence between DNA sequence score and sequence length. Our newly developed LOESS method is flexible in capturing diverse score-length relationships and is more effective in correcting DNA sequence scores for length-dependent artifacts, compared with four other approaches. Application of this method to genes co-expressed during Drosophila melanogaster embryonic mesoderm development or neural development scored by the Lever motif analysis algorithm resulted in successful recovery of their biologically validated cis-regulatory codes. The LOESS length-correction method is broadly applicable, and may be useful not only for more accurate inference of cis-regulatory codes, but also for detection of other types of patterns in biological sequences.

AVAILABILITY

Source code and compiled code are available from http://thebrain.bwh.harvard.edu/LM_LOESS/

摘要

动机

序列分析算法通常应用于一组 DNA、RNA 或蛋白质序列,以识别常见或区别特征。控制序列长度变化对于正确评分序列特征和识别真正的生物信号而不是长度相关的伪影至关重要。

结果

几种顺式调控模块发现算法显示 DNA 序列得分与序列长度之间存在显著的相关性。与其他四种方法相比,我们新开发的 LOESS 方法在捕捉不同的得分-长度关系方面更加灵活,并且在纠正 DNA 序列得分的长度相关伪影方面更加有效。将该方法应用于果蝇胚胎中胚层发育或神经发育过程中共同表达的基因,通过 Lever 基序分析算法进行评分,成功地恢复了其经过生物学验证的顺式调控代码。LOESS 长度校正方法具有广泛的适用性,不仅可以更准确地推断顺式调控代码,还可以检测生物序列中的其他类型模式。

可用性

源代码和编译代码可从 http://thebrain.bwh.harvard.edu/LM_LOESS/ 获得。

相似文献

1
LOESS correction for length variation in gene set-based genomic sequence analysis.基于基因集的基因组序列分析中长度变异的 LOESS 校正。
Bioinformatics. 2012 Jun 1;28(11):1446-54. doi: 10.1093/bioinformatics/bts155. Epub 2012 Apr 5.
2
DNA sequence models of genome-wide Drosophila melanogaster Polycomb binding sites improve generalization to independent Polycomb Response Elements.全基因组果蝇 Polycomb 结合位点的 DNA 序列模型提高了对独立 Polycomb 反应元件的泛化能力。
Nucleic Acids Res. 2019 Sep 5;47(15):7781-7797. doi: 10.1093/nar/gkz617.
3
Computational detection of genomic cis-regulatory modules applied to body patterning in the early Drosophila embryo.应用于早期果蝇胚胎身体模式形成的基因组顺式调控模块的计算检测。
BMC Bioinformatics. 2002 Oct 24;3:30. doi: 10.1186/1471-2105-3-30.
4
Prediction of similarly acting cis-regulatory modules by subsequence profiling and comparative genomics in Drosophila melanogaster and D.pseudoobscura.通过序列分析和比较基因组学预测黑腹果蝇和拟暗果蝇中具有相似作用的顺式调控模块
Bioinformatics. 2004 Nov 1;20(16):2738-50. doi: 10.1093/bioinformatics/bth320. Epub 2004 May 14.
5
Identification of sparsely distributed clusters of cis-regulatory elements in sets of co-expressed genes.在共表达基因集中鉴定顺式调控元件的稀疏分布簇。
Nucleic Acids Res. 2004 May 20;32(9):2889-900. doi: 10.1093/nar/gkh614. Print 2004.
6
REDUCE: An online tool for inferring cis-regulatory elements and transcriptional module activities from microarray data.REDUCE:一种用于从微阵列数据推断顺式调控元件和转录模块活性的在线工具。
Nucleic Acids Res. 2003 Jul 1;31(13):3487-90. doi: 10.1093/nar/gkg630.
7
Background rareness-based iterative multiple sequence alignment algorithm for regulatory element detection.用于调控元件检测的基于稀有性的迭代多序列比对算法
Bioinformatics. 2003 Oct 12;19(15):1952-63. doi: 10.1093/bioinformatics/btg266.
8
Identifying cis-regulatory modules by combining comparative and compositional analysis of DNA.通过结合DNA的比较分析和组成分析来识别顺式调控模块。
Bioinformatics. 2006 Dec 1;22(23):2858-64. doi: 10.1093/bioinformatics/btl499. Epub 2006 Oct 10.
9
Searching for statistically significant regulatory modules.寻找具有统计学意义的调控模块。
Bioinformatics. 2003 Oct;19 Suppl 2:ii16-25. doi: 10.1093/bioinformatics/btg1054.
10
De novo prediction of cis-regulatory elements and modules through integrative analysis of a large number of ChIP datasets.通过对大量染色质免疫沉淀数据集进行综合分析,从头预测顺式调控元件和模块。
BMC Genomics. 2014 Dec 2;15:1047. doi: 10.1186/1471-2164-15-1047.

引用本文的文献

1
A multi-bin rarefying method for evaluating alpha diversities in TCR sequencing data.一种用于评估 TCR 测序数据中 alpha 多样性的多-bin 稀疏化方法。
Bioinformatics. 2024 Jul 1;40(7). doi: 10.1093/bioinformatics/btae431.
2
Improving CNV Detection Performance in Microarray Data Using a Machine Learning-Based Approach.使用基于机器学习的方法提高微阵列数据中CNV的检测性能。
Diagnostics (Basel). 2023 Dec 29;14(1):84. doi: 10.3390/diagnostics14010084.
3
The clinical significance of RET gene fusion among Chinese patients with lung cancer.RET基因融合在中国肺癌患者中的临床意义。
Transl Cancer Res. 2020 Oct;9(10):6455-6463. doi: 10.21037/tcr-20-754.
4
Highly parallel assays of tissue-specific enhancers in whole Drosophila embryos.在整个果蝇胚胎中进行组织特异性增强子的高度平行分析。
Nat Methods. 2013 Aug;10(8):774-80. doi: 10.1038/nmeth.2558. Epub 2013 Jul 14.
5
Robust shifts in S100a9 expression with aging: a novel mechanism for chronic inflammation.S100a9 表达随衰老而显著变化:慢性炎症的新机制。
Sci Rep. 2013;3:1215. doi: 10.1038/srep01215. Epub 2013 Feb 5.

本文引用的文献

1
Two forkhead transcription factors regulate the division of cardiac progenitor cells by a Polo-dependent pathway.两个叉头转录因子通过 Polo 依赖性途径调节心脏祖细胞的分裂。
Dev Cell. 2012 Jul 17;23(1):97-111. doi: 10.1016/j.devcel.2012.05.011.
2
Differential regulation of mesodermal gene expression by Drosophila cell type-specific Forkhead transcription factors.果蝇细胞类型特异性叉头转录因子对中胚层基因表达的差异调控。
Development. 2012 Apr;139(8):1457-66. doi: 10.1242/dev.069005. Epub 2012 Feb 29.
3
Length bias correction for RNA-seq data in gene set analyses.基因集分析中 RNA-seq 数据的长度偏差校正。
Bioinformatics. 2011 Mar 1;27(5):662-9. doi: 10.1093/bioinformatics/btr005. Epub 2011 Jan 19.
4
Assessing computational methods of cis-regulatory module prediction.评估顺式调控模块预测的计算方法。
PLoS Comput Biol. 2010 Dec 2;6(12):e1001020. doi: 10.1371/journal.pcbi.1001020.
5
Accurate quantification of transcriptome from RNA-Seq data by effective length normalization.通过有效长度归一化对 RNA-Seq 数据进行转录组的精确定量。
Nucleic Acids Res. 2011 Jan;39(2):e9. doi: 10.1093/nar/gkq1015. Epub 2010 Nov 8.
6
De-correlating expression in gene-set analysis.基因集分析中的去相关表达。
Bioinformatics. 2010 Sep 15;26(18):i511-6. doi: 10.1093/bioinformatics/btq380.
7
TransFind--predicting transcriptional regulators for gene sets.TransFind——用于预测基因集转录调控因子的工具。
Nucleic Acids Res. 2010 Jul;38(Web Server issue):W275-80. doi: 10.1093/nar/gkq438. Epub 2010 May 28.
8
Transcript length bias in RNA-seq data confounds systems biology.RNA测序数据中的转录本长度偏差会混淆系统生物学。
Biol Direct. 2009 Apr 16;4:14. doi: 10.1186/1745-6150-4-14.
9
Stem cell transcriptome profiling via massive-scale mRNA sequencing.通过大规模mRNA测序进行干细胞转录组分析。
Nat Methods. 2008 Jul;5(7):613-9. doi: 10.1038/nmeth.1223. Epub 2008 May 30.
10
Mapping and quantifying mammalian transcriptomes by RNA-Seq.通过RNA测序对哺乳动物转录组进行定位和定量分析。
Nat Methods. 2008 Jul;5(7):621-8. doi: 10.1038/nmeth.1226. Epub 2008 May 30.