Suppr超能文献

基于基因组序列的具有置信区间和改进距离函数的物种界定。

Genome sequence-based species delimitation with confidence intervals and improved distance functions.

机构信息

Leibniz Institute DSMZ - German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany.

出版信息

BMC Bioinformatics. 2013 Feb 21;14:60. doi: 10.1186/1471-2105-14-60.

Abstract

BACKGROUND

For the last 25 years species delimitation in prokaryotes (Archaea and Bacteria) was to a large extent based on DNA-DNA hybridization (DDH), a tedious lab procedure designed in the early 1970s that served its purpose astonishingly well in the absence of deciphered genome sequences. With the rapid progress in genome sequencing time has come to directly use the now available and easy to generate genome sequences for delimitation of species. GBDP (Genome Blast Distance Phylogeny) infers genome-to-genome distances between pairs of entirely or partially sequenced genomes, a digital, highly reliable estimator for the relatedness of genomes. Its application as an in-silico replacement for DDH was recently introduced. The main challenge in the implementation of such an application is to produce digital DDH values that must mimic the wet-lab DDH values as close as possible to ensure consistency in the Prokaryotic species concept.

RESULTS

Correlation and regression analyses were used to determine the best-performing methods and the most influential parameters. GBDP was further enriched with a set of new features such as confidence intervals for intergenomic distances obtained via resampling or via the statistical models for DDH prediction and an additional family of distance functions. As in previous analyses, GBDP obtained the highest agreement with wet-lab DDH among all tested methods, but improved models led to a further increase in the accuracy of DDH prediction. Confidence intervals yielded stable results when inferred from the statistical models, whereas those obtained via resampling showed marked differences between the underlying distance functions.

CONCLUSIONS

Despite the high accuracy of GBDP-based DDH prediction, inferences from limited empirical data are always associated with a certain degree of uncertainty. It is thus crucial to enrich in-silico DDH replacements with confidence-interval estimation, enabling the user to statistically evaluate the outcomes. Such methodological advancements, easily accessible through the web service at http://ggdc.dsmz.de, are crucial steps towards a consistent and truly genome sequence-based classification of microorganisms.

摘要

背景

在过去的 25 年中,原核生物(古菌和细菌)的物种划分在很大程度上基于 DNA-DNA 杂交(DDH),这是一项在 20 世纪 70 年代早期设计的繁琐实验室程序,在没有破译基因组序列的情况下,它的效果惊人地好。随着基因组测序的快速进展,现在可以直接使用现有的、易于生成的基因组序列来划分物种。GBDP(基因组爆炸距离系统发育)推断完全或部分测序基因组对之间的基因组到基因组距离,是一种数字、高度可靠的基因组相关性估计器。最近,它被应用于替代 DDH 的虚拟方法。这种应用的主要挑战是生成数字 DDH 值,这些值必须尽可能接近湿实验室 DDH 值,以确保原核生物物种概念的一致性。

结果

使用相关和回归分析来确定性能最佳的方法和最具影响力的参数。GBDP 进一步丰富了一组新的功能,例如通过重新采样或通过 DDH 预测的统计模型获得的基因组间距离的置信区间,以及一组新的距离函数。与之前的分析一样,GBDP 在所有测试方法中与湿实验室 DDH 的一致性最高,但改进的模型导致 DDH 预测的准确性进一步提高。通过统计模型推断出的置信区间结果稳定,而通过重新采样获得的置信区间则在基础距离函数之间存在明显差异。

结论

尽管基于 GBDP 的 DDH 预测具有很高的准确性,但从有限的经验数据中进行推断总是存在一定程度的不确定性。因此,通过置信区间估计来丰富虚拟 DDH 替代物至关重要,这使用户能够对结果进行统计评估。通过网络服务 http://ggdc.dsmz.de 轻松获得的这些方法上的改进,是朝着基于基因组序列的微生物一致性和真正分类迈出的关键步骤。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/870c/3665452/b6a12c2d51bc/1471-2105-14-60-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验