Suppr超能文献

用于功能性非编码DNA比对的基准测试工具。

Benchmarking tools for the alignment of functional noncoding DNA.

作者信息

Pollard Daniel A, Bergman Casey M, Stoye Jens, Celniker Susan E, Eisen Michael B

机构信息

Biophysics Graduate Group, University of California, Berkeley, CA 94720, USA.

出版信息

BMC Bioinformatics. 2004 Jan 21;5:6. doi: 10.1186/1471-2105-5-6.

Abstract

BACKGROUND

Numerous tools have been developed to align genomic sequences. However, their relative performance in specific applications remains poorly characterized. Alignments of protein-coding sequences typically have been benchmarked against "correct" alignments inferred from structural data. For noncoding sequences, where such independent validation is lacking, simulation provides an effective means to generate "correct" alignments with which to benchmark alignment tools.

RESULTS

Using rates of noncoding sequence evolution estimated from the genus Drosophila, we simulated alignments over a range of divergence times under varying models incorporating point substitution, insertion/deletion events, and short blocks of constrained sequences such as those found in cis-regulatory regions. We then compared "correct" alignments generated by a modified version of the ROSE simulation platform to alignments of the simulated derived sequences produced by eight pairwise alignment tools (Avid, BlastZ, Chaos, ClustalW, DiAlign, Lagan, Needle, and WABA) to determine the off-the-shelf performance of each tool. As expected, the ability to align noncoding sequences accurately decreases with increasing divergence for all tools, and declines faster in the presence of insertion/deletion evolution. Global alignment tools (Avid, ClustalW, Lagan, and Needle) typically have higher sensitivity over entire noncoding sequences as well as in constrained sequences. Local tools (BlastZ, Chaos, and WABA) have lower overall sensitivity as a consequence of incomplete coverage, but have high specificity to detect constrained sequences as well as high sensitivity within the subset of sequences they align. Tools such as DiAlign, which generate both local and global outputs, produce alignments of constrained sequences with both high sensitivity and specificity for divergence distances in the range of 1.25-3.0 substitutions per site.

CONCLUSION

For species with genomic properties similar to Drosophila, we conclude that a single pair of optimally diverged species analyzed with a high performance alignment tool can yield accurate and specific alignments of functionally constrained noncoding sequences. Further algorithm development, optimization of alignment parameters, and benchmarking studies will be necessary to extract the maximal biological information from alignments of functional noncoding DNA.

摘要

背景

已经开发了许多用于比对基因组序列的工具。然而,它们在特定应用中的相对性能仍未得到很好的描述。蛋白质编码序列的比对通常已根据从结构数据推断出的“正确”比对进行基准测试。对于缺乏这种独立验证的非编码序列,模拟提供了一种有效的方法来生成“正确”比对,以此来对比对工具进行基准测试。

结果

利用从果蝇属估计的非编码序列进化速率,我们在一系列分歧时间下,在包含点替换、插入/缺失事件以及顺式调控区域中发现的短约束序列块等不同模型下模拟比对。然后,我们将由ROSE模拟平台的修改版本生成的“正确”比对与由八个成对比对工具(Avid、BlastZ、Chaos、ClustalW、DiAlign、Lagan、Needle和WABA)生成的模拟衍生序列的比对进行比较,以确定每个工具的现成性能。正如预期的那样,对于所有工具,准确比对非编码序列的能力随着分歧增加而降低,并且在存在插入/缺失进化的情况下下降得更快。全局比对工具(Avid、ClustalW、Lagan和Needle)通常在整个非编码序列以及约束序列中具有更高的灵敏度。局部工具(BlastZ、Chaos和WABA)由于覆盖不完整而具有较低的总体灵敏度,但在检测约束序列方面具有高特异性,并且在它们比对的序列子集中具有高灵敏度。诸如DiAlign之类的工具会生成局部和全局输出,对于每个位点有1.25 - 3.0个替换的分歧距离范围,它们生成的约束序列比对具有高灵敏度和特异性。

结论

对于具有与果蝇相似基因组特性的物种,我们得出结论,使用高性能比对工具分析一对最佳分歧物种可以产生功能受限非编码序列的准确且特异的比对。要从功能性非编码DNA的比对中提取最大生物学信息,还需要进一步的算法开发、比对参数优化和基准测试研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6d97/344529/e8d1c6ac1d17/1471-2105-5-6-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验