Suppr超能文献

元基因组学中的系统发生卷积神经网络。

Phylogenetic convolutional neural networks in metagenomics.

机构信息

Fondazione Bruno Kessler (FBK), Via Sommarive 18 Povo, Trento, I-38123, Italy.

Max Planck Institute for Intelligent Systems, Spemannstraße 34, Tübingen, 72076, Germany.

出版信息

BMC Bioinformatics. 2018 Mar 8;19(Suppl 2):49. doi: 10.1186/s12859-018-2033-5.

Abstract

BACKGROUND

Convolutional Neural Networks can be effectively used only when data are endowed with an intrinsic concept of neighbourhood in the input space, as is the case of pixels in images. We introduce here Ph-CNN, a novel deep learning architecture for the classification of metagenomics data based on the Convolutional Neural Networks, with the patristic distance defined on the phylogenetic tree being used as the proximity measure. The patristic distance between variables is used together with a sparsified version of MultiDimensional Scaling to embed the phylogenetic tree in a Euclidean space.

RESULTS

Ph-CNN is tested with a domain adaptation approach on synthetic data and on a metagenomics collection of gut microbiota of 38 healthy subjects and 222 Inflammatory Bowel Disease patients, divided in 6 subclasses. Classification performance is promising when compared to classical algorithms like Support Vector Machines and Random Forest and a baseline fully connected neural network, e.g. the Multi-Layer Perceptron.

CONCLUSION

Ph-CNN represents a novel deep learning approach for the classification of metagenomics data. Operatively, the algorithm has been implemented as a custom Keras layer taking care of passing to the following convolutional layer not only the data but also the ranked list of neighbourhood of each sample, thus mimicking the case of image data, transparently to the user.

摘要

背景

卷积神经网络只有在输入空间中具有固有邻域概念的数据中才能有效使用,就像图像中的像素一样。我们在这里引入 Ph-CNN,这是一种基于卷积神经网络的分类元基因组数据的新型深度学习架构,使用定义在系统发育树上的亲缘距离作为接近度度量。变量之间的亲缘距离与稀疏化的多维尺度一起使用,将系统发育树嵌入欧几里得空间中。

结果

Ph-CNN 在合成数据和 38 个健康受试者和 222 名炎症性肠病患者的肠道微生物组元基因组数据集上进行了域自适应方法的测试,分为 6 个亚类。与经典算法(如支持向量机和随机森林)和全连接神经网络(如多层感知机)基线相比,分类性能很有前景。

结论

Ph-CNN 代表了一种用于分类元基因组数据的新型深度学习方法。在操作上,该算法已作为一个自定义 Keras 层实现,不仅可以传递数据,还可以传递每个样本的邻域排序列表到下一个卷积层,从而模仿图像数据的情况,对用户透明。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验