• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

定位 DNA 和 cDNA 序列中的三联体周期性。

Localizing triplet periodicity in DNA and cDNA sequences.

机构信息

Cold Spring Harbor Laboratory, Williams #5, Cold Spring Harbor, NY 11724, USA.

出版信息

BMC Bioinformatics. 2010 Nov 8;11:550. doi: 10.1186/1471-2105-11-550.

DOI:10.1186/1471-2105-11-550
PMID:21059240
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2992068/
Abstract

BACKGROUND

The protein-coding regions (coding exons) of a DNA sequence exhibit a triplet periodicity (TP) due to fact that coding exons contain a series of three nucleotide codons that encode specific amino acid residues. Such periodicity is usually not observed in introns and intergenic regions. If a DNA sequence is divided into small segments and a Fourier Transform is applied on each segment, a strong peak at frequency 1/3 is typically observed in the Fourier spectrum of coding segments, but not in non-coding regions. This property has been used in identifying the locations of protein-coding genes in unannotated sequence. The method is fast and requires no training. However, the need to compute the Fourier Transform across a segment (window) of arbitrary size affects the accuracy with which one can localize TP boundaries. Here, we report a technique that provides higher-resolution identification of these boundaries, and use the technique to explore the biological correlates of TP regions in the genome of the model organism C. elegans.

RESULTS

Using both simulated TP signals and the real C. elegans sequence F56F11 as an example, we demonstrate that, (1) Modified Wavelet Transform (MWT) can better define the boundary of TP region than the conventional Short Time Fourier Transform (STFT); (2) The scale parameter (a) of MWT determines the precision of TP boundary localization: bigger values of a give sharper TP boundaries but result in a lower signal to noise ratio; (3) RNA splicing sites have weaker TP signals than coding region; (4) TP signals in coding region can be destroyed or recovered by frame-shift mutations; (5) 6 bp periodicities in introns and intergenic region can generate false positive signals and it can be removed with 6 bp MWT.

CONCLUSIONS

MWT can provide more precise TP boundaries than STFT and the boundaries can be further refined by bigger scale MWT. Subtraction of 6 bp periodicity signals reduces the number of false positives. Experimentally-introduced frame-shift mutations help recover TP signal that have been lost by possible ancient frame-shifts. More importantly, TP signal has the potential to be used to detect the splice junctions in fully spliced mRNA sequence.

摘要

背景

由于编码外显子包含一系列三个核苷酸密码子,这些密码子编码特定的氨基酸残基,因此 DNA 序列的编码区(编码外显子)表现出三联体周期性(TP)。这种周期性通常在内含子和基因间区中观察不到。如果将 DNA 序列分成小片段,并对每个片段应用傅里叶变换,则在编码片段的傅里叶谱中通常会观察到频率为 1/3 的强峰,但在非编码区域则不会。该特性已用于在未注释的序列中识别蛋白质编码基因的位置。该方法速度快,无需训练。然而,需要在任意大小的片段(窗口)上计算傅里叶变换会影响定位 TP 边界的准确性。在这里,我们报告了一种提供更高分辨率识别这些边界的技术,并使用该技术探索模型生物秀丽隐杆线虫基因组中 TP 区域的生物学相关性。

结果

使用模拟的 TP 信号和真实的秀丽隐杆线虫序列 F56F11 作为示例,我们证明了:(1)改进的小波变换(MWT)比传统的短时傅里叶变换(STFT)更好地定义了 TP 区域的边界;(2)MWT 的尺度参数(a)决定了 TP 边界定位的精度:较大的 a 值给出更锐利的 TP 边界,但会导致较低的信噪比;(3)RNA 剪接位点的 TP 信号比编码区弱;(4)编码区的 TP 信号可以通过移码突变而被破坏或恢复;(5)内含子和基因间区的 6 bp 周期性会产生假阳性信号,并且可以用 6 bp MWT 去除。

结论

MWT 可以比 STFT 提供更精确的 TP 边界,并且通过更大的尺度 MWT 可以进一步细化边界。减去 6 bp 周期性信号可减少假阳性的数量。实验引入的移码突变有助于恢复可能由古代移码引起的丢失的 TP 信号。更重要的是,TP 信号有可能用于检测完全拼接的 mRNA 序列中的剪接接头。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c248/2992068/c829f070a013/1471-2105-11-550-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c248/2992068/0bad0d02edb4/1471-2105-11-550-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c248/2992068/f982b91312bc/1471-2105-11-550-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c248/2992068/3a2eb28e9025/1471-2105-11-550-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c248/2992068/66b4bc1db8a7/1471-2105-11-550-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c248/2992068/4f8e4b1ba8a3/1471-2105-11-550-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c248/2992068/19b62b453569/1471-2105-11-550-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c248/2992068/fbd1a53e91aa/1471-2105-11-550-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c248/2992068/4445aa7b2014/1471-2105-11-550-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c248/2992068/c829f070a013/1471-2105-11-550-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c248/2992068/0bad0d02edb4/1471-2105-11-550-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c248/2992068/f982b91312bc/1471-2105-11-550-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c248/2992068/3a2eb28e9025/1471-2105-11-550-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c248/2992068/66b4bc1db8a7/1471-2105-11-550-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c248/2992068/4f8e4b1ba8a3/1471-2105-11-550-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c248/2992068/19b62b453569/1471-2105-11-550-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c248/2992068/fbd1a53e91aa/1471-2105-11-550-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c248/2992068/4445aa7b2014/1471-2105-11-550-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c248/2992068/c829f070a013/1471-2105-11-550-9.jpg

相似文献

1
Localizing triplet periodicity in DNA and cDNA sequences.定位 DNA 和 cDNA 序列中的三联体周期性。
BMC Bioinformatics. 2010 Nov 8;11:550. doi: 10.1186/1471-2105-11-550.
2
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]
Yi Chuan Xue Bao. 2004 May;31(5):431-43.
3
Discrete Ramanujan transform for distinguishing the protein coding regions from other regions.用于区分蛋白质编码区域与其他区域的离散拉马努金变换。
Mol Cell Probes. 2014 Oct-Dec;28(5-6):228-36. doi: 10.1016/j.mcp.2014.04.002. Epub 2014 Apr 29.
4
Detecting Periodicities in Eukaryotic Genomes by Ramanujan Fourier Transform.通过拉马努金傅里叶变换检测真核生物基因组中的周期性
J Comput Biol. 2018 Sep;25(9):963-975. doi: 10.1089/cmb.2017.0252. Epub 2018 Jul 2.
5
Detecting particular features in C. elegans genomes using synchronous analysis based on wavelet transform.基于小波变换的同步分析检测秀丽隐杆线虫基因组中的特定特征。
Int J Bioinform Res Appl. 2011;7(2):183-201. doi: 10.1504/IJBRA.2011.040096.
6
Evolution of the periodicity and the self-similarity in DNA sequence: a Fourier transform analysis.DNA序列中周期性和自相似性的演变:傅里叶变换分析
Jpn J Physiol. 2001 Apr;51(2):159-68. doi: 10.2170/jjphysiol.51.159.
7
Transcriptome analysis for Caenorhabditis elegans based on novel expressed sequence tags.基于新表达序列标签的秀丽隐杆线虫转录组分析。
BMC Biol. 2008 Jul 8;6:30. doi: 10.1186/1741-7007-6-30.
8
Distribution bias of the sequence matching between exons and introns in exon joint and EJC binding region in C. elegans.秀丽隐杆线虫中外显子连接和外显子连接复合体(EJC)结合区域中外显子与内含子序列匹配的分布偏差
J Theor Biol. 2015 Jan 7;364:295-304. doi: 10.1016/j.jtbi.2014.09.009. Epub 2014 Sep 16.
9
Genomic organization of mouse and human erythrocyte tropomodulin genes encoding the pointed end capping protein for the actin filaments.编码肌动蛋白丝尖端封端蛋白的小鼠和人类红细胞原肌球蛋白基因的基因组结构。
Gene. 2000 Oct 3;256(1-2):271-81. doi: 10.1016/s0378-1119(00)00327-9.
10
Classification analysis of triplet periodicity in protein-coding regions of genes.基因蛋白质编码区域中三联体周期性的分类分析
Gene. 2008 Sep 15;421(1-2):52-60. doi: 10.1016/j.gene.2008.06.012. Epub 2008 Jun 11.

引用本文的文献

1
Prediction of Sphingosine protein-coding regions with a self adaptive spectral rotation method.利用自适光谱旋转方法预测鞘氨醇蛋白编码区。
PLoS One. 2019 Apr 3;14(4):e0214442. doi: 10.1371/journal.pone.0214442. eCollection 2019.
2
Search for potential reading frameshifts in cds from Arabidopsis thaliana and other genomes.搜索拟南芥和其他基因组中的 cds 潜在移码突变。
DNA Res. 2019 Apr 1;26(2):157-170. doi: 10.1093/dnares/dsy046.
3
Wavelet analysis of frequency chaos game signal: a time-frequency signature of the DNA.频率混沌博弈信号的小波分析:DNA的时频特征

本文引用的文献

1
Understanding mechanisms underlying human gene expression variation with RNA sequencing.利用 RNA 测序理解人类基因表达变异的机制。
Nature. 2010 Apr 1;464(7289):768-72. doi: 10.1038/nature08872. Epub 2010 Mar 10.
2
Discrete wavelet transform de-noising in eukaryotic gene splicing.离散小波变换在真核基因剪接中的去噪。
BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S50. doi: 10.1186/1471-2105-11-S1-S50.
3
A hybrid technique for the periodicity characterization of genomic sequence data.一种用于基因组序列数据周期性表征的混合技术。
EURASIP J Bioinform Syst Biol. 2014 Sep 12;2014:16. doi: 10.1186/s13637-014-0016-z. eCollection 2014 Dec.
4
Periodic power spectrum with applications in detection of latent periodicities in DNA sequences.具有检测DNA序列中潜在周期性应用的周期功率谱。
J Math Biol. 2016 Nov;73(5):1053-1079. doi: 10.1007/s00285-016-0982-8. Epub 2016 Mar 4.
5
The structure of selective dinucleotide interactions and periodicities in D melanogaster mtDNA.黑腹果蝇线粒体DNA中选择性二核苷酸相互作用和周期性的结构
Biol Res. 2014 May 23;47(1):18. doi: 10.1186/0717-6287-47-18.
6
SNR of DNA sequences mapped by general affine transformations of the indicator sequences.由指示序列的一般仿射变换映射的DNA序列的信噪比。
J Math Biol. 2013 Aug;67(2):433-51. doi: 10.1007/s00285-012-0564-3. Epub 2012 Jul 21.
7
Design of high-performance parallelized gene predictors in MATLAB.基于MATLAB的高性能并行基因预测器设计。
BMC Res Notes. 2012 Apr 10;5:183. doi: 10.1186/1756-0500-5-183.
8
Schematic for efficient computation of GC, GC3, and AT3 bias spectra of genome.基因组GC、GC3和AT3偏差谱高效计算的示意图。
Bioinformation. 2012;8(3):163-6. doi: 10.6026/97320630008163. Epub 2012 Feb 3.
9
3-base periodicity in coding DNA is affected by intercodon dinucleotides.编码DNA中的三联体周期性受密码子间二核苷酸的影响。
Bioinformation. 2011;6(9):327-9. doi: 10.6026/97320630006327. Epub 2011 Jul 19.
EURASIP J Bioinform Syst Biol. 2009;2009(1):924601. doi: 10.1155/2009/924601. Epub 2009 Apr 8.
4
Identification of protein coding regions using the modified Gabor-wavelet transform.使用改进的伽柏小波变换识别蛋白质编码区域。
IEEE/ACM Trans Comput Biol Bioinform. 2008 Apr-Jun;5(2):198-207. doi: 10.1109/TCBB.2007.70259.
5
A simple model to explain three-base periodicity in coding DNA.
FEBS Lett. 2006 Nov 27;580(27):6413-22. doi: 10.1016/j.febslet.2006.10.056. Epub 2006 Nov 3.
6
Frequent appearance of novel protein-coding sequences by frameshift translation.通过移码翻译频繁出现新的蛋白质编码序列。
Genomics. 2006 Dec;88(6):690-697. doi: 10.1016/j.ygeno.2006.06.009. Epub 2006 Aug 4.
7
Gene prediction with a hidden Markov model and a new intron submodel.基于隐马尔可夫模型和新型内含子子模型的基因预测
Bioinformatics. 2003 Oct;19 Suppl 2:ii215-25. doi: 10.1093/bioinformatics/btg1080.
8
Mechanisms of alternative pre-messenger RNA splicing.可变前体信使核糖核酸剪接机制
Annu Rev Biochem. 2003;72:291-336. doi: 10.1146/annurev.biochem.72.121801.161720. Epub 2003 Feb 27.
9
Predictive identification of exonic splicing enhancers in human genes.人类基因中外显子剪接增强子的预测性识别。
Science. 2002 Aug 9;297(5583):1007-13. doi: 10.1126/science.1073774. Epub 2002 Jul 11.
10
The human genome browser at UCSC.加州大学圣克鲁兹分校的人类基因组浏览器。
Genome Res. 2002 Jun;12(6):996-1006. doi: 10.1101/gr.229102.