PlasFlow:利用基因组特征预测宏基因组数据中的质粒序列。
PlasFlow: predicting plasmid sequences in metagenomic data using genome signatures.
机构信息
Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Pawinskiego 5a, 02-106 Warsaw, Poland.
Department of Genetics and Biotechnology, Faculty of Biology, University of Warsaw, Pawinskiego 5a, 02-106 Warsaw, Poland.
出版信息
Nucleic Acids Res. 2018 Apr 6;46(6):e35. doi: 10.1093/nar/gkx1321.
Plasmids are mobile genetics elements that play an important role in the environmental adaptation of microorganisms. Although plasmids are usually analyzed in cultured microorganisms, there is a need for methods that allow for the analysis of pools of plasmids (plasmidomes) in environmental samples. To that end, several molecular biology and bioinformatics methods have been developed; however, they are limited to environments with low diversity and cannot recover large plasmids. Here, we present PlasFlow, a novel tool based on genomic signatures that employs a neural network approach for identification of bacterial plasmid sequences in environmental samples. PlasFlow can recover plasmid sequences from assembled metagenomes without any prior knowledge of the taxonomical or functional composition of samples with an accuracy up to 96%. It can also recover sequences of both circular and linear plasmids and can perform initial taxonomical classification of sequences. Compared to other currently available tools, PlasFlow demonstrated significantly better performance on test datasets. Analysis of two samples from heavy metal-contaminated microbial mats revealed that plasmids may constitute an important fraction of their metagenomes and carry genes involved in heavy-metal homeostasis, proving the pivotal role of plasmids in microorganism adaptation to environmental conditions.
质粒是一种移动遗传元件,在微生物的环境适应中发挥着重要作用。尽管质粒通常在培养的微生物中进行分析,但需要有一种方法能够分析环境样本中的质粒池(质粒组)。为此,已经开发了几种分子生物学和生物信息学方法;然而,这些方法仅限于多样性低的环境,并且无法回收大型质粒。在这里,我们提出了 PlasFlow,这是一种基于基因组特征的新型工具,它采用神经网络方法来识别环境样本中的细菌质粒序列。PlasFlow 可以从组装的宏基因组中回收质粒序列,而无需事先了解样本的分类学或功能组成,准确率高达 96%。它还可以回收圆形和线性质粒的序列,并可以对序列进行初步的分类学分类。与其他现有的工具相比,PlasFlow 在测试数据集上表现出了显著更好的性能。对来自重金属污染的微生物垫的两个样本的分析表明,质粒可能构成其宏基因组的重要部分,并携带参与重金属稳态的基因,证明了质粒在微生物适应环境条件中的关键作用。