• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过结合统计信息和与蛋白质序列的相似性来预测人类cDNA序列是否包含起始密码子。

Prediction whether a human cDNA sequence contains initiation codon by combining statistical information and similarity with protein sequences.

作者信息

Nishikawa T, Ota T, Isogai T

机构信息

Helix Research Institute, Chiba, Japan.

出版信息

Bioinformatics. 2000 Nov;16(11):960-7. doi: 10.1093/bioinformatics/16.11.960.

DOI:10.1093/bioinformatics/16.11.960
PMID:11159307
Abstract

MOTIVATION

In the previous works, we developed ATGpr, a computer program for predicting the fullness of a cDNA, i.e. whether it contains an initiation codon or not. Statistical information of short nucleotide fragments was fully exploited in the prediction algorithm. However, sequence similarities to known proteins, which are becoming increasingly available due to recent rapid growth of protein database, were not used in the prediction. In this work, we present a new prediction algorithm based on both statistical and similarity information, which provides better performance in sensitivity and specificity.

RESULTS

We evaluated the accuracy of ATGpr for predicting fullness of cDNA sequences from human clustered ESTs of UniGene, and we obtained specificity, sensitivity, and correlation coefficient of this prediction. Specificity and sensitivity crossed at 46% over the ATGpr score threshold of 0.33 and the maximum correlation coefficient of 0.34 was obtained at this threshold. Without ATGpr we found it effective to use alignments with known proteins for predicting the fullness of cDNA sequences. That is, specificity increased monotonously as similarity (identity of the alignments) increased. Specificity was achieved greater than 80% if identity was greater than 40%. For more effective prediction of fullness of cDNA sequences we combined the similarity (identity of query sequence) with known proteins and ATGpr score. As a result, specificity became greater than 80% if identity was greater than 20%.

AVAILABILITY

The prediction program, called ATGpr_ sim, is available at http://www.hri.co.jp/atgpr/ATGpr_sim.html

CONTACT

nisikawa@crl.hitachi.co.jp

摘要

动机

在之前的工作中,我们开发了ATGpr,这是一个用于预测cDNA完整性(即是否包含起始密码子)的计算机程序。预测算法充分利用了短核苷酸片段的统计信息。然而,由于蛋白质数据库最近的快速增长,与已知蛋白质的序列相似性越来越容易获得,但在预测中并未使用。在这项工作中,我们提出了一种基于统计和相似性信息的新预测算法,该算法在敏感性和特异性方面具有更好的性能。

结果

我们评估了ATGpr预测来自UniGene人类聚类EST的cDNA序列完整性的准确性,并获得了该预测的特异性、敏感性和相关系数。在ATGpr分数阈值为0.33时,特异性和敏感性在46%处交叉,在此阈值下获得的最大相关系数为0.34。我们发现,在没有ATGpr的情况下,使用与已知蛋白质的比对来预测cDNA序列的完整性是有效的。也就是说,特异性随着相似性(比对的一致性)的增加而单调增加。如果一致性大于40%,则特异性大于80%。为了更有效地预测cDNA序列的完整性,我们将相似性(查询序列的一致性)与已知蛋白质和ATGpr分数相结合。结果,如果一致性大于20%,则特异性大于80%。

可用性

名为ATGpr_sim的预测程序可在http://www.hri.co.jp/atgpr/ATGpr_sim.html获得。

联系方式

nisikawa@crl.hitachi.co.jp

相似文献

1
Prediction whether a human cDNA sequence contains initiation codon by combining statistical information and similarity with protein sequences.通过结合统计信息和与蛋白质序列的相似性来预测人类cDNA序列是否包含起始密码子。
Bioinformatics. 2000 Nov;16(11):960-7. doi: 10.1093/bioinformatics/16.11.960.
2
Assessing protein coding region integrity in cDNA sequencing projects.在cDNA测序项目中评估蛋白质编码区完整性。
Bioinformatics. 1998 Jun;14(5):384-90. doi: 10.1093/bioinformatics/14.5.384.
3
Database and analysis system for cDNA clones obtained from full-length enriched cDNA libraries.从全长富集cDNA文库获得的cDNA克隆的数据库及分析系统。
In Silico Biol. 2002;2(1):5-18.
4
Comparison of computational methods for identifying translation initiation sites in EST data.用于识别EST数据中翻译起始位点的计算方法比较。
BMC Bioinformatics. 2004 Feb 16;5:14. doi: 10.1186/1471-2105-5-14.
5
Amino acid translation program for full-length cDNA sequences with frameshift errors.用于具有移码错误的全长cDNA序列的氨基酸翻译程序。
Physiol Genomics. 2001 Mar 8;5(2):81-7. doi: 10.1152/physiolgenomics.2001.5.2.81.
6
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]
Yi Chuan Xue Bao. 2004 May;31(5):431-43.
7
Clustering protein sequences--structure prediction by transitive homology.蛋白质序列聚类——通过传递同源性进行结构预测
Bioinformatics. 2001 Oct;17(10):935-41. doi: 10.1093/bioinformatics/17.10.935.
8
Gene structure conservation aids similarity based gene prediction.基因结构保守性有助于基于相似性的基因预测。
Nucleic Acids Res. 2004 Feb 4;32(2):776-83. doi: 10.1093/nar/gkh211. Print 2004.
9
Presence of ATG triplets in 5' untranslated regions of eukaryotic cDNAs correlates with a 'weak' context of the start codon.真核生物cDNA 5'非翻译区中ATG三联体的存在与起始密码子的“弱”上下文相关。
Bioinformatics. 2001 Oct;17(10):890-900. doi: 10.1093/bioinformatics/17.10.890.
10
mRNA 5' region sequence incompleteness: a potential source of systematic errors in translation initiation codon assignment in human mRNAs.mRNA 5'区域序列不完整性:人类mRNA翻译起始密码子分配中系统误差的一个潜在来源。
Gene. 2003 Dec 4;321:185-93. doi: 10.1016/s0378-1119(03)00835-7.

引用本文的文献

1
Structural Features and Physiological Associations of Human 14-3-3ζ Pseudogenes.人类14-3-3ζ假基因的结构特征与生理关联
Genes (Basel). 2024 Mar 24;15(4):399. doi: 10.3390/genes15040399.
2
Sex peptide receptor is not required for refractoriness to remating or induction of egg laying in Aedes aegypti.埃及伊蚊对再次交配的不应性或产卵诱导不需要性肽受体。
Genetics. 2024 May 7;227(1). doi: 10.1093/genetics/iyae034.
3
Pre- and Post-Transcriptional Control of HBV Gene Expression: The Road Traveled towards the New Paradigm of HBx, Its Isoforms, and Their Diverse Functions.
乙肝病毒基因表达的转录前和转录后调控:通向乙肝X蛋白及其异构体新范式及其多样功能的探索之路
Biomedicines. 2023 Jun 9;11(6):1674. doi: 10.3390/biomedicines11061674.
4
Ancestral reconstruction reveals catalytic inactivation of activation-induced cytidine deaminase concomitant with cold water adaption in the Gadiformes bony fish.祖先重建揭示了激活诱导的胞苷脱氨酶的催化失活伴随着硬骨鱼形目 Gadiformes 冷水适应。
BMC Biol. 2022 Dec 27;20(1):293. doi: 10.1186/s12915-022-01489-8.
5
Case report: Unusual episodic myopathy in a patient with novel homozygous deletion of first coding exon of gene.病例报告:一名基因首个编码外显子发生新型纯合缺失的患者出现不寻常的发作性肌病。
Front Neurol. 2022 Nov 8;13:1008937. doi: 10.3389/fneur.2022.1008937. eCollection 2022.
6
The Proliferating Cell Nuclear Antigen (PCNA) Transcript Variants as Potential Relapse Markers in B-Cell Acute Lymphoblastic Leukemia.增殖细胞核抗原(PCNA)转录变体作为 B 细胞急性淋巴细胞白血病潜在的复发标志物。
Cells. 2022 Oct 12;11(20):3205. doi: 10.3390/cells11203205.
7
Expression of immunoglobulin constant domain genes in neurons of the mouse central nervous system.免疫球蛋白恒定区基因在小鼠中枢神经系统神经元中的表达。
Life Sci Alliance. 2021 Aug 25;4(11). doi: 10.26508/lsa.202101154. Print 2021 Nov.
8
Global sequence features based translation initiation site prediction in human genomic sequences.基于全局序列特征的人类基因组序列翻译起始位点预测
Heliyon. 2020 Sep 14;6(9):e04825. doi: 10.1016/j.heliyon.2020.e04825. eCollection 2020 Sep.
9
Expression of different L1 isoforms of papillomavirus as mechanism to circumvent adaptive immunity.不同 HPV 长型异构体的表达作为规避适应性免疫的机制。
Elife. 2020 Aug 4;9:e57626. doi: 10.7554/eLife.57626.
10
Lnc-TALC promotes O-methylguanine-DNA methyltransferase expression via regulating the c-Met pathway by competitively binding with miR-20b-3p.Lnc-TALC 通过竞争性结合 miR-20b-3p 调控 c-Met 通路促进 O-甲基鸟嘌呤-DNA 甲基转移酶表达。
Nat Commun. 2019 May 3;10(1):2045. doi: 10.1038/s41467-019-10025-2.