Suppr超能文献

SonicParanoid2:使用机器学习和语言模型实现快速、准确、全面的直系同源推断。

SonicParanoid2: fast, accurate, and comprehensive orthology inference with machine learning and language models.

机构信息

Department of Integrated Biosciences, Graduate School of Frontier Sciences, the University of Tokyo, Kashiwa, Japan.

Center of Excellence in Computational Molecular Biology, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand.

出版信息

Genome Biol. 2024 Jul 25;25(1):195. doi: 10.1186/s13059-024-03298-4.

Abstract

Accurate inference of orthologous genes constitutes a prerequisite for comparative and evolutionary genomics. SonicParanoid is one of the fastest tools for orthology inference; however, its scalability and accuracy have been hampered by time-consuming all-versus-all alignments and the existence of proteins with complex domain architectures. Here, we present a substantial update of SonicParanoid, where a gradient boosting predictor halves the execution time and a language model doubles the recall. Application to empirical large-scale and standardized benchmark datasets shows that SonicParanoid2 is much faster than comparable methods and also the most accurate. SonicParanoid2 is available at https://gitlab.com/salvo981/sonicparanoid2 and https://zenodo.org/doi/10.5281/zenodo.11371108 .

摘要

准确推断直系同源基因是比较和进化基因组学的前提。SonicParanoid 是最快速的直系同源基因推断工具之一;然而,其可扩展性和准确性受到耗时的全对全比对和具有复杂结构域架构的蛋白质的限制。在这里,我们对 SonicParanoid 进行了重大更新,其中梯度提升预测器将执行时间缩短了一半,语言模型将召回率提高了一倍。在经验丰富的大规模和标准化基准数据集上的应用表明,SonicParanoid2 比可比方法快得多,而且也更准确。SonicParanoid2 可在 https://gitlab.com/salvo981/sonicparanoid2https://zenodo.org/doi/10.5281/zenodo.11371108 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be5f/11270883/a09d21907aa3/13059_2024_3298_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验