Suppr超能文献

仅使用费舍尔精确检验对预测的匿名蛋白质进行验证。

Validation of predicted anonymous proteins simply using Fisher's exact test.

作者信息

Claverie Jean-Michel, Santini Sébastien

机构信息

Aix-Marseille University, CNRS, IGS (UMR7256), IMM (FR3479), Luminy, Marseille F-13288, France.

出版信息

Bioinform Adv. 2021 Nov 15;1(1):vbab034. doi: 10.1093/bioadv/vbab034. eCollection 2021.

Abstract

MOTIVATION

Genomes sequencing has become the primary (and often the sole) experimental method to characterize newly discovered organisms, in particular from the microbial world (bacteria, archaea, viruses). This generates an ever increasing number of predicted proteins the existence of which is unwarranted, in particular among those without homolog in model organisms. As a last resort, the computation of the selection pressure from pairwise alignments of the corresponding 'Open Reading Frames' (ORFs) can be used to validate their existences. However, this approach is error-prone, as not usually associated with a significance test.

RESULTS

We introduce the use of the straightforward Fisher's exact test as a postprocessing of the results provided by the popular CODEML sequence comparison software. The respective rates of nucleotide changes at the nonsynonymous versus synonymous position (as determined by CODEML) are turned into entries into a 2 × 2 contingency table, the probability of which is computed under the Null hypothesis that they should not behave differently if the ORFs do not encode actual proteins. Using the genome sequences of two recently isolated giant viruses, we show that strong negative selection pressures do not always provide a solid argument in favor of the existence of proteins.

摘要

动机

基因组测序已成为表征新发现生物体(特别是来自微生物界的细菌、古菌、病毒)的主要(且常常是唯一的)实验方法。这使得预测蛋白质的数量不断增加,而这些蛋白质的存在并无依据,尤其是在那些在模式生物中没有同源物的蛋白质中。作为最后的手段,可以通过计算相应“开放阅读框”(ORF)的成对比对中的选择压力来验证它们的存在。然而,这种方法容易出错,因为通常没有显著性检验。

结果

我们引入了直接的费舍尔精确检验,作为对流行的CODEML序列比较软件提供的结果的后处理。非同义与同义位置的核苷酸变化率(由CODEML确定)被转换为2×2列联表中的条目,在ORF不编码实际蛋白质时它们不应表现出差异的零假设下计算其概率。使用两种最近分离的巨型病毒的基因组序列,我们表明强负选择压力并不总是支持蛋白质存在的确凿证据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a7b6/9710694/381aa05706ab/vbab034f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验