Department of ISOM, HKUST, Clear Water Bay, Kowloon, Hong Kong.
Bioinformatics. 2012 Nov 1;28(21):2834-42. doi: 10.1093/bioinformatics/bts531. Epub 2012 Sep 3.
Epistasis or gene-gene interaction has gained increasing attention in studies of complex diseases. Its presence as an ubiquitous component of genetic architecture of common human diseases has been contemplated. However, the detection of gene-gene interaction is difficult due to combinatorial explosion.
We present a novel feature selection method incorporating variable interaction. Three gene expression datasets are analyzed to illustrate our method, although it can also be applied to other types of high-dimensional data. The quality of variables selected is evaluated in two ways: first by classification error rates, then by functional relevance assessed using biological knowledge. We show that the classification error rates can be significantly reduced by considering interactions. Secondly, a sizable portion of genes identified by our method for breast cancer metastasis overlaps with those reported in gene-to-system breast cancer (G2SBC) database as disease associated and some of them have interesting biological implication. In summary, interaction-based methods may lead to substantial gain in biological insights as well as more accurate prediction.
上位性或基因-基因相互作用在复杂疾病的研究中受到越来越多的关注。它作为常见人类疾病遗传结构的普遍组成部分的存在已经被考虑。然而,由于组合爆炸,基因-基因相互作用的检测很困难。
我们提出了一种新的特征选择方法,该方法结合了变量相互作用。分析了三个基因表达数据集来说明我们的方法,尽管它也可以应用于其他类型的高维数据。通过两种方式评估所选变量的质量:首先通过分类错误率,然后通过使用生物知识评估功能相关性。我们表明,通过考虑相互作用,可以显著降低分类错误率。其次,我们的方法识别的乳腺癌转移相关基因中有相当一部分与基因到系统乳腺癌(G2SBC)数据库中报告的与疾病相关的基因重叠,其中一些具有有趣的生物学意义。总之,基于相互作用的方法可能会在生物学见解和更准确的预测方面带来实质性的收益。