Suppr超能文献

AMAnD:一种利用 DeepSVDD 神经网络的自动化宏基因组异常检测方法。

AMAnD: an automated metagenome anomaly detection methodology utilizing DeepSVDD neural networks.

机构信息

Life Science Resource Center, MRIGlobal, Gaithersburg, MD, United States.

出版信息

Front Public Health. 2023 Jul 11;11:1181911. doi: 10.3389/fpubh.2023.1181911. eCollection 2023.

Abstract

The composition of metagenomic communities within the human body often reflects localized medical conditions such as upper respiratory diseases and gastrointestinal diseases. Fast and accurate computational tools to flag anomalous metagenomic samples from typical samples are desirable to understand different phenotypes, especially in contexts where repeated, long-duration temporal sampling is done. Here, we present Automated Metagenome Anomaly Detection (AMAnD), which utilizes two types of Deep Support Vector Data Description (DeepSVDD) models; one trained on taxonomic feature space output by the Pan-Genomics for Infectious Agents (PanGIA) taxonomy classifier and one trained on kmer frequency counts. AMAnD's semi-supervised one-class approach makes no assumptions about what an anomaly may look like, allowing the flagging of potentially novel anomaly types. Three diverse datasets are profiled. The first dataset is hosted on the National Center for Biotechnology Information's (NCBI) Sequence Read Archive (SRA) and contains nasopharyngeal swabs from healthy and COVID-19-positive patients. The second dataset is also hosted on SRA and contains gut microbiome samples from normal controls and from patients with slow transit constipation (STC). AMAnD can learn a typical healthy nasopharyngeal or gut microbiome profile and reliably flag the anomalous COVID+ or STC samples in both feature spaces. The final dataset is a synthetic metagenome created by the Critical Assessment of Metagenome Annotation Simulator (CAMISIM). A control dataset of 50 well-characterized organisms was submitted to CAMISIM to generate 100 synthetic control class samples. The experimental conditions included 12 different spiked-in contaminants that are taxonomically similar to organisms present in the laboratory blank sample ranging from one strain tree branch taxonomic distance away to one family tree branch taxonomic distance away. This experiment was repeated in triplicate at three different coverage levels to probe the dependence on sample coverage. AMAnD was again able to flag the contaminant inserts as anomalous. AMAnD's assumption-free flagging of metagenomic anomalies, the real-time model training update potential of the deep learning approach, and the strong performance even with lightweight models of low sample cardinality would make AMAnD well-suited to a wide array of applied metagenomics biosurveillance use-cases, from environmental to clinical utility.

摘要

人体元基因组群落的组成通常反映出局部的医学状况,如上呼吸道疾病和胃肠道疾病。需要快速准确的计算工具来标记来自典型样本的异常元基因组样本,以便了解不同的表型,特别是在需要进行重复、长时间的时间采样的情况下。在这里,我们提出了自动化元基因组异常检测(AMAnD),它利用了两种深度支持向量数据描述(DeepSVDD)模型;一种是在 Pan-Genomics for Infectious Agents(PanGIA)分类器的分类特征空间上训练的,另一种是在 kmer 频率计数上训练的。AMAnD 的半监督单类方法不假设异常可能是什么样子,允许标记潜在的新异常类型。我们对三个不同的数据集进行了分析。第一个数据集托管在国家生物技术信息中心(NCBI)的序列读取档案(SRA)上,包含来自健康和 COVID-19 阳性患者的鼻咽拭子。第二个数据集也托管在 SRA 上,包含来自正常对照者和慢传输性便秘(STC)患者的肠道微生物组样本。AMAnD 可以学习典型的健康鼻咽或肠道微生物组特征,并在两个特征空间中可靠地标记异常的 COVID+或 STC 样本。最后一个数据集是由元基因组注释评估模拟器(CAMISIM)创建的合成元基因组。一个由 50 个特征良好的生物组成的控制数据集被提交给 CAMISIM 来生成 100 个合成控制类样本。实验条件包括 12 种不同的污染物质,它们在分类上与实验室空白样本中的生物相似,从一个分支的树状距离到一个家族的树状距离。这个实验在三个不同的覆盖水平上重复了三次,以探测对样本覆盖的依赖性。AMAnD 再次能够标记污染物质的插入作为异常。AMAnD 对元基因组异常的无假设标记、深度学习方法的实时模型训练更新潜力,以及即使在低样本基数的轻量级模型上也能表现出强大的性能,这使得 AMAnD 非常适合广泛的应用元基因组生物监测用例,从环境到临床应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ebd4/10368493/541e8dd26105/fpubh-11-1181911-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验