• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于屏蔽低复杂度DNA序列的快速且对称的DUST实现方法。

A fast and symmetric DUST implementation to mask low-complexity DNA sequences.

作者信息

Morgulis Aleksandr, Gertz E Michael, Schäffer Alejandro A, Agarwala Richa

机构信息

National Center for Biotechnology Information, National Institutes of Health, Department of Health and Human Services, Bethesda, Maryland 20894 USA.

出版信息

J Comput Biol. 2006 Jun;13(5):1028-40. doi: 10.1089/cmb.2006.13.1028.

DOI:10.1089/cmb.2006.13.1028
PMID:16796549
Abstract

The DUST module has been used within BLAST for many years to mask low-complexity sequences. In this paper, we present a new implementation of the DUST module that uses the same function to assign a complexity score to a sequence, but uses a different rule by which high-scoring sequences are masked. The new rule masks every nucleotide masked by the old rule and occasionally masks more. The new masking rule corrects two related deficiencies with the old rule. First, the new rule is symmetric with respect to reversing the sequence. Second, the new rule is not context sensitive; the decision to mask a subsequence does not depend on what sequences flank it. The new implementation is at least four times faster than the old on the human genome. We show that both the percentage of additional bases masked and the effect on MegaBLAST outputs are very small.

摘要

多年来,DUST模块一直在BLAST中用于屏蔽低复杂度序列。在本文中,我们展示了DUST模块的一种新实现方式,它使用相同的函数为序列分配复杂度分数,但使用不同的规则来屏蔽高分序列。新规则会屏蔽旧规则所屏蔽的每个核苷酸,并且偶尔会屏蔽更多核苷酸。新的屏蔽规则纠正了旧规则的两个相关缺陷。第一,新规则在序列反转方面是对称的。第二,新规则不依赖上下文;屏蔽一个子序列的决定不取决于其两侧的序列。在人类基因组上,新实现方式的速度至少比旧方式快四倍。我们表明,额外屏蔽的碱基百分比以及对MegaBLAST输出的影响都非常小。

相似文献

1
A fast and symmetric DUST implementation to mask low-complexity DNA sequences.一种用于屏蔽低复杂度DNA序列的快速且对称的DUST实现方法。
J Comput Biol. 2006 Jun;13(5):1028-40. doi: 10.1089/cmb.2006.13.1028.
2
WindowMasker: window-based masker for sequenced genomes.窗口掩码器:用于测序基因组的基于窗口的掩码器。
Bioinformatics. 2006 Jan 15;22(2):134-41. doi: 10.1093/bioinformatics/bti774. Epub 2005 Nov 15.
3
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]
Yi Chuan Xue Bao. 2004 May;31(5):431-43.
4
Fast model-based protein homology detection without alignment.基于快速模型的无需比对的蛋白质同源性检测。
Bioinformatics. 2007 Jul 15;23(14):1728-36. doi: 10.1093/bioinformatics/btm247. Epub 2007 May 8.
5
DIALIGN-T: an improved algorithm for segment-based multiple sequence alignment.DIALIGN-T:一种改进的基于片段的多序列比对算法。
BMC Bioinformatics. 2005 Mar 22;6:66. doi: 10.1186/1471-2105-6-66.
6
CLAGen: a tool for clustering and annotating gene sequences using a suffix tree algorithm.CLAGen:一种使用后缀树算法对基因序列进行聚类和注释的工具。
Biosystems. 2006 Jun;84(3):175-82. doi: 10.1016/j.biosystems.2005.11.001. Epub 2005 Dec 27.
7
Pattern locator: a new tool for finding local sequence patterns in genomic DNA sequences.模式定位器:一种在基因组DNA序列中寻找局部序列模式的新工具。
Bioinformatics. 2006 Dec 15;22(24):3099-100. doi: 10.1093/bioinformatics/btl551. Epub 2006 Nov 8.
8
FBSA: feature-based sequence alignment technique for very large sequences.FBSA:用于超长序列的基于特征的序列比对技术。
Appl Bioinformatics. 2003;2(3):145-50.
9
Sigma: multiple alignment of weakly-conserved non-coding DNA sequence.西格玛:弱保守非编码DNA序列的多重比对
BMC Bioinformatics. 2006 Mar 16;7:143. doi: 10.1186/1471-2105-7-143.
10
SRmapper: a fast and sensitive genome-hashing alignment tool.SRmapper:一种快速且灵敏的基因组哈希比对工具。
Bioinformatics. 2013 Feb 1;29(3):316-21. doi: 10.1093/bioinformatics/bts712. Epub 2012 Dec 24.

引用本文的文献

1
Efficient sequence alignment against millions of prokaryotic genomes with LexicMap.使用LexicMap与数百万个原核生物基因组进行高效序列比对。
Nat Biotechnol. 2025 Sep 10. doi: 10.1038/s41587-025-02812-8.
2
Finding easy regions for short-read variant calling from pangenome data.从泛基因组数据中寻找易于进行短读变异检测的区域。
Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf103.
3
Finding easy regions for short-read variant calling from pangenome data.从泛基因组数据中寻找易于进行短读变异检测的区域。
ArXiv. 2025 Aug 8:arXiv:2507.03718v2.
4
Molecular mechanisms of plastic biodegradation by the fungus .真菌对塑料生物降解的分子机制
mBio. 2025 Jun 30:e0033525. doi: 10.1128/mbio.00335-25.
5
Region-Based Analysis with Functional Annotation Identifies Genes Associated with Cognitive Function in South Asians from India.基于区域分析和功能注释识别印度南亚人群中与认知功能相关的基因。
Genes (Basel). 2025 May 27;16(6):640. doi: 10.3390/genes16060640.
6
Double-stranded DNA viruses may serve as vectors for horizontal transfer of intron-generating transposons.双链DNA病毒可能充当产生内含子的转座子水平转移的载体。
Mob DNA. 2025 Jun 14;16(1):25. doi: 10.1186/s13100-025-00363-y.
7
Are reads required? High-precision variant calling from bacterial genome assemblies.是否需要读数?从细菌基因组组装中进行高精度变异检测。
Access Microbiol. 2025 May 28;7(5). doi: 10.1099/acmi.0.001025.v3. eCollection 2025.
8
The evolution and convergence of mutation spectra across mammals.哺乳动物突变谱的演变与趋同
Commun Biol. 2025 May 17;8(1):763. doi: 10.1038/s42003-025-08181-x.
9
Whole-Genome Sequencing Reveals Individual and Cohort Level Insights into Chromosome 9p Syndromes.全基因组测序揭示了对9号染色体短臂综合征的个体和队列水平见解。
medRxiv. 2025 Mar 30:2025.03.28.25324850. doi: 10.1101/2025.03.28.25324850.
10
Improving Whole Biodiversity Monitoring and Discovery With Environmental DNA Metagenomics.利用环境DNA宏基因组学改善整体生物多样性监测与发现
Mol Ecol Resour. 2025 Aug;25(6):e14105. doi: 10.1111/1755-0998.14105. Epub 2025 Apr 1.