Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America.
PLoS Comput Biol. 2010 Jan 29;6(1):e1000652. doi: 10.1371/journal.pcbi.1000652.
We address the problem of finding statistically significant associations between cis-regulatory motifs and functional gene sets, in order to understand the biological roles of transcription factors. We develop a computational framework for this task, whose features include a new statistical score for motif scanning, the use of different scores for predicting targets of different motifs, and new ways to deal with redundancies among significant motif-function associations. This framework is applied to the recently sequenced genome of the jewel wasp, Nasonia vitripennis, making use of the existing knowledge of motifs and gene annotations in another insect genome, that of the fruitfly. The framework uses cross-species comparison to improve the specificity of its predictions, and does so without relying upon non-coding sequence alignment. It is therefore well suited for comparative genomics across large evolutionary divergences, where existing alignment-based methods are not applicable. We also apply the framework to find motifs associated with socially regulated gene sets in the honeybee, Apis mellifera, using comparisons with Nasonia, a solitary species, to identify honeybee-specific associations.
我们解决了在 cis 调控基序和功能基因集之间发现统计学显著关联的问题,以便了解转录因子的生物学作用。我们为此任务开发了一个计算框架,其特点包括用于基序扫描的新统计评分、为不同基序预测目标使用不同的评分以及处理显著基序-功能关联之间冗余的新方法。该框架应用于最近测序的丽蝇蛹金小蜂基因组,利用另一种昆虫基因组(果蝇)中的基序和基因注释的现有知识。该框架使用跨物种比较来提高其预测的特异性,而无需依赖非编码序列比对。因此,它非常适合于跨越大进化分歧的比较基因组学,而现有基于比对的方法在这种情况下并不适用。我们还应用该框架来寻找与社会性调节基因集相关的基序,这些基序存在于蜜蜂中,利用与丽蝇(一种独居物种)的比较来识别蜜蜂特有的关联。