Suppr超能文献

用于DNA和蛋白质诊断特征开发的草图序列数据与完成序列数据。

Draft versus finished sequence data for DNA and protein diagnostic signature development.

作者信息

Gardner Shea N, Lam Marisa W, Smith Jason R, Torres Clinton L, Slezak Tom R

机构信息

Pathogen Bio-Informatics, Lawrence Livermore National Laboratory, PO Box 808, L-174, Livermore, CA 94551, USA.

出版信息

Nucleic Acids Res. 2005 Oct 20;33(18):5838-50. doi: 10.1093/nar/gki896. Print 2005.

Abstract

Sequencing pathogen genomes is costly, demanding careful allocation of limited sequencing resources. We built a computational Sequencing Analysis Pipeline (SAP) to guide decisions regarding the amount of genomic sequencing necessary to develop high-quality diagnostic DNA and protein signatures. SAP uses simulations to estimate the number of target genomes and close phylogenetic relatives (near neighbors or NNs) to sequence. We use SAP to assess whether draft data are sufficient or finished sequencing is required using Marburg and variola virus sequences. Simulations indicate that intermediate to high-quality draft with error rates of 10(-3)-10(-5) (approximately 8x coverage) of target organisms is suitable for DNA signature prediction. Low-quality draft with error rates of approximately 1% (3x to 6x coverage) of target isolates is inadequate for DNA signature prediction, although low-quality draft of NNs is sufficient, as long as the target genomes are of high quality. For protein signature prediction, sequencing errors in target genomes substantially reduce the detection of amino acid sequence conservation, even if the draft is of high quality. In summary, high-quality draft of target and low-quality draft of NNs appears to be a cost-effective investment for DNA signature prediction, but may lead to underestimation of predicted protein signatures.

摘要

对病原体基因组进行测序成本高昂,需要谨慎分配有限的测序资源。我们构建了一个计算测序分析流程(SAP),以指导关于开发高质量诊断DNA和蛋白质特征所需的基因组测序量的决策。SAP使用模拟来估计要测序的目标基因组数量和密切的系统发育亲属(近邻或NNs)。我们使用SAP通过马尔堡病毒和天花病毒序列评估草图数据是否足够或是否需要完成测序。模拟表明,目标生物体错误率为10^(-3)-10^(-5)(约8倍覆盖度)的中等至高质量草图适用于DNA特征预测。目标分离株错误率约为1%(3倍至6倍覆盖度)的低质量草图不足以进行DNA特征预测,不过只要目标基因组质量高,NNs的低质量草图就足够。对于蛋白质特征预测,即使草图质量高,目标基因组中的测序错误也会大幅降低氨基酸序列保守性的检测。总之,目标的高质量草图和NNs的低质量草图似乎是DNA特征预测的一种经济有效的投入,但可能会导致预测的蛋白质特征被低估。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1203/1266063/ed5350c70551/gki896f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验