Fondazione Bruno Kessler (FBK), Via Sommarive 18 Povo, Trento, I-38123, Italy.
Max Planck Institute for Intelligent Systems, Spemannstraße 34, Tübingen, 72076, Germany.
BMC Bioinformatics. 2018 Mar 8;19(Suppl 2):49. doi: 10.1186/s12859-018-2033-5.
Convolutional Neural Networks can be effectively used only when data are endowed with an intrinsic concept of neighbourhood in the input space, as is the case of pixels in images. We introduce here Ph-CNN, a novel deep learning architecture for the classification of metagenomics data based on the Convolutional Neural Networks, with the patristic distance defined on the phylogenetic tree being used as the proximity measure. The patristic distance between variables is used together with a sparsified version of MultiDimensional Scaling to embed the phylogenetic tree in a Euclidean space.
Ph-CNN is tested with a domain adaptation approach on synthetic data and on a metagenomics collection of gut microbiota of 38 healthy subjects and 222 Inflammatory Bowel Disease patients, divided in 6 subclasses. Classification performance is promising when compared to classical algorithms like Support Vector Machines and Random Forest and a baseline fully connected neural network, e.g. the Multi-Layer Perceptron.
Ph-CNN represents a novel deep learning approach for the classification of metagenomics data. Operatively, the algorithm has been implemented as a custom Keras layer taking care of passing to the following convolutional layer not only the data but also the ranked list of neighbourhood of each sample, thus mimicking the case of image data, transparently to the user.
卷积神经网络只有在输入空间中具有固有邻域概念的数据中才能有效使用,就像图像中的像素一样。我们在这里引入 Ph-CNN,这是一种基于卷积神经网络的分类元基因组数据的新型深度学习架构,使用定义在系统发育树上的亲缘距离作为接近度度量。变量之间的亲缘距离与稀疏化的多维尺度一起使用,将系统发育树嵌入欧几里得空间中。
Ph-CNN 在合成数据和 38 个健康受试者和 222 名炎症性肠病患者的肠道微生物组元基因组数据集上进行了域自适应方法的测试,分为 6 个亚类。与经典算法(如支持向量机和随机森林)和全连接神经网络(如多层感知机)基线相比,分类性能很有前景。
Ph-CNN 代表了一种用于分类元基因组数据的新型深度学习方法。在操作上,该算法已作为一个自定义 Keras 层实现,不仅可以传递数据,还可以传递每个样本的邻域排序列表到下一个卷积层,从而模仿图像数据的情况,对用户透明。