School of Computer Science, McGill University, 3630 University, Montreal, QC, Canada H3A 2B2.
Bioinformatics. 2011 Jul 1;27(13):i266-74. doi: 10.1093/bioinformatics/btr241.
The identification of non-coding functional regions of the human genome remains one of the main challenges of genomics. By observing how a given region evolved over time, one can detect signs of negative or positive selection hinting that the region may be functional. With the quickly increasing number of vertebrate genomes to compare with our own, this type of approach is set to become extremely powerful, provided the right analytical tools are available.
A large number of approaches have been proposed to measure signs of past selective pressure, usually in the form of reduced mutation rate. Here, we propose a radically different approach to the detection of non-coding functional region: instead of measuring past evolutionary rates, we build a machine learning classifier to predict current substitution rates in human based on the inferred evolutionary events that affected the region during vertebrate evolution. We show that different types of evolutionary events, occurring along different branches of the phylogenetic tree, bring very different amounts of information. We propose a number of simple machine learning classifiers and show that a Support-Vector Machine (SVM) predictor clearly outperforms existing tools at predicting human non-coding functional sites. Comparison to external evidences of selection and regulatory function confirms that these SVM predictions are more accurate than those of other approaches.
The predictor and predictions made are available at http://www.mcb.mcgill.ca/~blanchem/sadri.
人类基因组中非编码功能区域的鉴定仍然是基因组学的主要挑战之一。通过观察给定区域随时间的进化方式,可以检测到负向或正向选择的迹象,暗示该区域可能具有功能。随着越来越多的脊椎动物基因组可供比较,这种方法将变得非常强大,只要有合适的分析工具。
已经提出了大量的方法来衡量过去选择压力的迹象,通常是以降低突变率的形式。在这里,我们提出了一种截然不同的检测非编码功能区域的方法:不是测量过去的进化率,而是根据推断的进化事件构建一个机器学习分类器,来预测人类在脊椎动物进化过程中受到影响的区域的当前替代率。我们表明,不同类型的进化事件,发生在系统发育树的不同分支上,会带来非常不同的信息量。我们提出了一些简单的机器学习分类器,并表明支持向量机(SVM)预测器在预测人类非编码功能位点方面明显优于现有工具。与选择和调节功能的外部证据进行比较证实,这些 SVM 预测比其他方法更准确。
预测器和生成的预测可在 http://www.mcb.mcgill.ca/~blanchem/sadri 上获得。