Suppr超能文献

一种用于评估直系同源数据库不一致性的度量标准及其衍生的蛋白质网络。

A metric and its derived protein network for evaluation of ortholog database inconsistency.

作者信息

Yang Weijie, Ji Jingsi, Fang Gang

机构信息

NYU-Shanghai, Shanghai, 200120, China.

Software Engineering Institute, East China Normal University, Shanghai, 200062, China.

出版信息

BMC Bioinformatics. 2025 Jan 7;26(1):6. doi: 10.1186/s12859-024-06023-x.

Abstract

BACKGROUND

Ortholog prediction, essential for various genomic research areas, faces growing inconsistencies amidst the expanding array of ortholog databases. The common strategy of computing consensus orthologs introduces additional arbitrariness, emphasizing the need to examine the causes of such inconsistencies and identify proteins susceptible to prediction errors.

RESULTS

We introduce the Signal Jaccard Index (SJI), a novel metric rooted in unsupervised genome context clustering, designed to assess protein similarity. Leveraging SJI, we construct a protein network and reveal that peripheral proteins within the network are the primary contributors to inconsistencies in orthology predictions. Furthermore, we show that a protein's degree centrality in the network serves as a strong predictor of its reliability in consensus sets.

CONCLUSIONS

We present an objective, unsupervised SJI-based network encompassing all proteins, in which its topological features elucidate ortholog prediction inconsistencies. The degree centrality (DC) effectively identifies error-prone orthology assignments without relying on arbitrary parameters. Notably, DC is stable, unaffected by species selection, and well-suited for ortholog benchmarking. This approach transcends the limitations of universal thresholds, offering a robust and quantitative framework to explore protein evolution and functional relationships.

摘要

背景

直系同源物预测对各种基因组研究领域至关重要,但在不断扩充的直系同源物数据库中,其一致性问题日益凸显。计算一致性直系同源物的常见策略引入了额外的随意性,这凸显了审视此类不一致性的原因并识别易受预测错误影响的蛋白质的必要性。

结果

我们引入了信号杰卡德指数(SJI),这是一种基于无监督基因组上下文聚类的新型指标,旨在评估蛋白质相似性。利用SJI,我们构建了一个蛋白质网络,并发现网络中的外围蛋白质是直系同源预测不一致性的主要原因。此外,我们表明蛋白质在网络中的度中心性是其在一致性集合中可靠性的有力预测指标。

结论

我们提出了一个基于SJI的客观、无监督网络,涵盖所有蛋白质,其拓扑特征阐明了直系同源预测的不一致性。度中心性(DC)无需依赖任意参数就能有效识别易出错的直系同源分配。值得注意的是,DC是稳定的,不受物种选择的影响,非常适合用于直系同源物基准测试。这种方法超越了通用阈值的局限性,提供了一个强大的定量框架来探索蛋白质进化和功能关系。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/348d/11707888/563098886750/12859_2024_6023_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验