Demey J R, Vicente-Villardón J L, Galindo-Villardón M P, Zambrano A Y
Centro de Biotecnología, Instituto de Estudios Avanzados (IDEA), Caracas, Venezuela.
Bioinformatics. 2008 Dec 15;24(24):2832-8. doi: 10.1093/bioinformatics/btn552. Epub 2008 Oct 29.
For characterization of genetic diversity in genotypes several molecular techniques, usually resulting in a binary data matrix, have been used. Despite the fact that in Cluster Analysis (CA) and Principal Coordinates Analysis (PCoA) the interpretation of the variables responsible for grouping is not straightforward, these methods are commonly used to classify genotypes using DNA molecular markers. In this article, we present a novel algorithm that uses a combination of PCoA, CA and Logistic Regression (LR), as a better way to interpret the variables (alleles or bands) associated to the classification of genotypes. The combination of three standard techniques with some new ideas about the geometry of the procedures, allows constructing an External Logistic Biplot (ELB) that helps in the interpretation of the variables responsible for the classification or ordination. An application of the method to study the genetic diversity of four populations from Africa, Asia and Europe, using the HapMap data is included.
The Matlab code for implementing the methods may be obtained from the web site: http://biplot.usal.es.
为了表征基因型中的遗传多样性,人们使用了几种通常会产生二元数据矩阵的分子技术。尽管在聚类分析(CA)和主坐标分析(PCoA)中,对负责分组的变量的解释并不直接,但这些方法通常用于使用DNA分子标记对基因型进行分类。在本文中,我们提出了一种新颖的算法,该算法结合了PCoA、CA和逻辑回归(LR),作为解释与基因型分类相关的变量(等位基因或条带)的更好方法。三种标准技术与一些关于程序几何结构的新想法相结合,使得构建一个外部逻辑双标图(ELB)成为可能,这有助于解释负责分类或排序的变量。文中包含了该方法应用于使用HapMap数据研究来自非洲、亚洲和欧洲的四个人群的遗传多样性的内容。
实现这些方法的Matlab代码可从网站http://biplot.usal.es获取。