Plyusnin Ilya, Evans Alistair R, Karme Aleksis, Gionis Aristides, Jernvall Jukka
Institute of Biotechnology, University of Helsinki, Helsinki, Finland.
PLoS One. 2008 Mar 5;3(3):e1742. doi: 10.1371/journal.pone.0001742.
The ability to analyze and classify three-dimensional (3D) biological morphology has lagged behind the analysis of other biological data types such as gene sequences. Here, we introduce the techniques of data mining to the study of 3D biological shapes to bring the analyses of phenomes closer to the efficiency of studying genomes. We compiled five training sets of highly variable morphologies of mammalian teeth from the MorphoBrowser database. Samples were labeled either by dietary class or by conventional dental types (e.g. carnassial, selenodont). We automatically extracted a multitude of topological attributes using Geographic Information Systems (GIS)-like procedures that were then used in several combinations of feature selection schemes and probabilistic classification models to build and optimize classifiers for predicting the labels of the training sets. In terms of classification accuracy, computational time and size of the feature sets used, non-repeated best-first search combined with 1-nearest neighbor classifier was the best approach. However, several other classification models combined with the same searching scheme proved practical. The current study represents a first step in the automatic analysis of 3D phenotypes, which will be increasingly valuable with the future increase in 3D morphology and phenomics databases.
对三维(3D)生物形态进行分析和分类的能力,落后于对其他生物数据类型(如基因序列)的分析。在此,我们将数据挖掘技术引入到3D生物形状的研究中,以使表型分析更接近基因组研究的效率。我们从MorphoBrowser数据库中汇编了五组具有高度可变形态的哺乳动物牙齿训练集。样本根据饮食类别或传统牙齿类型(如裂齿、月型齿)进行标记。我们使用类似地理信息系统(GIS)的程序自动提取了大量拓扑属性,然后将这些属性用于多种特征选择方案和概率分类模型的组合中,以构建和优化用于预测训练集标签的分类器。在分类准确率、计算时间和所用特征集的大小方面,非重复最佳优先搜索与1-最近邻分类器相结合是最佳方法。然而,其他几种与相同搜索方案相结合的分类模型也被证明是可行的。当前的研究代表了3D表型自动分析的第一步,随着未来3D形态学和表型组学数据库的增加,这将变得越来越有价值。