Suppr超能文献

复合搜索:一种广义的网络方法,用于检测复合基因家族。

CompositeSearch: A Generalized Network Approach for Composite Gene Families Detection.

机构信息

Institut de Biologie Paris-Seine (IBPS), UPMC Université Paris 06, Sorbonne Universités, Paris, France.

Département de Sciences Biologiques, Université de Montréal, Montréal, QC, Canada.

出版信息

Mol Biol Evol. 2018 Jan 1;35(1):252-255. doi: 10.1093/molbev/msx283.

Abstract

Genes evolve by point mutations, but also by shuffling, fusion, and fission of genetic fragments. Therefore, similarity between two sequences can be due to common ancestry producing homology, and/or partial sharing of component fragments. Disentangling these processes is especially challenging in large molecular data sets, because of computational time. In this article, we present CompositeSearch, a memory-efficient, fast, and scalable method to detect composite gene families in large data sets (typically in the range of several million sequences). CompositeSearch generalizes the use of similarity networks to detect composite and component gene families with a greater recall, accuracy, and precision than recent programs (FusedTriplets and MosaicFinder). Moreover, CompositeSearch provides user-friendly quality descriptions regarding the distribution and primary sequence conservation of these gene families allowing critical biological analyses of these data.

摘要

基因通过点突变进化,但也通过遗传片段的重排、融合和分裂进化。因此,两个序列之间的相似性可能是由于同源性产生的共同祖先,也可能是由于组成片段的部分共享。在大型分子数据集,由于计算时间的原因,区分这些过程尤其具有挑战性。在本文中,我们提出了 CompositeSearch,这是一种内存高效、快速且可扩展的方法,用于在大型数据集(通常在几百万个序列的范围内)中检测复合基因家族。CompositeSearch 将相似性网络的使用推广到检测复合和组成基因家族,其召回率、准确性和精度都高于最近的程序(FusedTriplets 和 MosaicFinder)。此外,CompositeSearch 提供了关于这些基因家族分布和原始序列保守性的用户友好的质量描述,从而可以对这些数据进行关键的生物学分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e9c9/5850286/f7eb977a63c2/msx283f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验