使用机器学习预测全基因组冗余。

Predicting genome-wide redundancy using machine learning.

机构信息

Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY 10003, USA.

出版信息

BMC Evol Biol. 2010 Nov 18;10:357. doi: 10.1186/1471-2148-10-357.

DOI:10.1186/1471-2148-10-357

PMID:21087504

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2998534/

Abstract

BACKGROUND

Gene duplication can lead to genetic redundancy, which masks the function of mutated genes in genetic analyses. Methods to increase sensitivity in identifying genetic redundancy can improve the efficiency of reverse genetics and lend insights into the evolutionary outcomes of gene duplication. Machine learning techniques are well suited to classifying gene family members into redundant and non-redundant gene pairs in model species where sufficient genetic and genomic data is available, such as Arabidopsis thaliana, the test case used here.

RESULTS

Machine learning techniques that combine multiple attributes led to a dramatic improvement in predicting genetic redundancy over single trait classifiers alone, such as BLAST E-values or expression correlation. In withholding analysis, one of the methods used here, Support Vector Machines, was two-fold more precise than single attribute classifiers, reaching a level where the majority of redundant calls were correctly labeled. Using this higher confidence in identifying redundancy, machine learning predicts that about half of all genes in Arabidopsis showed the signature of predicted redundancy with at least one but typically less than three other family members. Interestingly, a large proportion of predicted redundant gene pairs were relatively old duplications (e.g., Ks > 1), suggesting that redundancy is stable over long evolutionary periods.

CONCLUSIONS

Machine learning predicts that most genes will have a functionally redundant paralog but will exhibit redundancy with relatively few genes within a family. The predictions and gene pair attributes for Arabidopsis provide a new resource for research in genetics and genome evolution. These techniques can now be applied to other organisms.

摘要

背景

基因复制可能导致遗传冗余，从而掩盖遗传分析中突变基因的功能。提高识别遗传冗余敏感性的方法可以提高反向遗传学的效率，并深入了解基因复制的进化结果。机器学习技术非常适合在具有足够遗传和基因组数据的模式物种（如这里使用的拟南芥）中将基因家族成员分类为冗余和非冗余基因对。

结果

将多个属性结合起来的机器学习技术，与仅使用单个特征分类器（如 BLAST E 值或表达相关性）相比，极大地提高了预测遗传冗余的能力。在保留分析中，这里使用的一种方法——支持向量机，比单属性分类器精确两倍，达到了大多数冗余调用都被正确标记的水平。通过这种更高的冗余识别置信度，机器学习预测约有一半的拟南芥基因与至少一个但通常少于三个其他家族成员具有预测冗余的特征。有趣的是，很大一部分预测的冗余基因对是相对较老的复制（例如，Ks > 1），这表明冗余在较长的进化时期是稳定的。

结论

机器学习预测大多数基因将具有功能冗余的同源基因，但在家族内与相对较少的基因表现出冗余。拟南芥的预测和基因对属性为遗传学和基因组进化研究提供了新的资源。这些技术现在可以应用于其他生物体。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ce6/2998534/7c010ec8ebcf/1471-2148-10-357-1.jpg

相似文献

Predicting genome-wide redundancy using machine learning.使用机器学习预测全基因组冗余。

BMC Evol Biol. 2010 Nov 18;10:357. doi: 10.1186/1471-2148-10-357.

Predictive Models of Genetic Redundancy in Arabidopsis thaliana.拟南芥基因冗余的预测模型

Mol Biol Evol. 2021 Jul 29;38(8):3397-3414. doi: 10.1093/molbev/msab111.

Combining classifiers to predict gene function in Arabidopsis thaliana using large-scale gene expression measurements.结合分类器利用大规模基因表达测量预测拟南芥基因功能。

BMC Bioinformatics. 2007 Sep 21;8:358. doi: 10.1186/1471-2105-8-358.

Molecular population genetics of redundant floral-regulatory genes in Arabidopsis thaliana.拟南芥中冗余花调控基因的分子群体遗传学

Mol Biol Evol. 2005 Jan;22(1):91-103. doi: 10.1093/molbev/msh261. Epub 2004 Sep 15.

Comprehensive Evolutionary and Expression Analysis of FCS-Like Zinc finger Gene Family Yields Insights into Their Origin, Expansion and Divergence.全面的进化和 FCS 样锌指基因家族的表达分析揭示了它们的起源、扩张和分化。

PLoS One. 2015 Aug 7;10(8):e0134328. doi: 10.1371/journal.pone.0134328. eCollection 2015.

Nonrandom divergence of gene expression following gene and genome duplications in the flowering plant Arabidopsis thaliana.开花植物拟南芥中基因和基因组复制后基因表达的非随机分化

Genome Biol. 2006;7(2):R13. doi: 10.1186/gb-2006-7-2-r13. Epub 2006 Feb 20.

Genetic redundancy of senescence-associated transcription factors in Arabidopsis.拟南芥衰老相关转录因子的遗传冗余。

J Exp Bot. 2018 Feb 12;69(4):811-823. doi: 10.1093/jxb/erx345.

Degree of Functional Divergence in Duplicates Is Associated with Distinct Roles in Plant Evolution.功能分化程度在重复基因中与植物进化中的不同角色相关。

Mol Biol Evol. 2021 Apr 13;38(4):1447-1459. doi: 10.1093/molbev/msaa302.

Genome-wide and molecular evolution analyses of the phospholipase D gene family in Poplar and Grape.杨树和葡萄中磷脂酶 D 基因家族的全基因组和分子进化分析。

BMC Plant Biol. 2010 Jun 18;10:117. doi: 10.1186/1471-2229-10-117.

Genome-wide analysis of CCCH zinc finger family in Arabidopsis and rice.拟南芥和水稻中CCCH锌指蛋白家族的全基因组分析

BMC Genomics. 2008 Jan 27;9:44. doi: 10.1186/1471-2164-9-44.

引用本文的文献

Understanding redundancy and resilience: Redundancy in life is provided by distributing functions across networks rather than back-up systems: Redundancy in life is provided by distributing functions across networks rather than back-up systems.理解冗余和弹性：生命中的冗余是通过在网络中分配功能而不是备份系统来提供的：生命中的冗余是通过在网络中分配功能而不是备份系统来提供的。

EMBO Rep. 2022 Feb 3;23(3):e54742. doi: 10.15252/embr.202254742. Epub 2022 Feb 14.

Predictive Models of Genetic Redundancy in Arabidopsis thaliana.拟南芥基因冗余的预测模型

Mol Biol Evol. 2021 Jul 29;38(8):3397-3414. doi: 10.1093/molbev/msab111.

Degree of Functional Divergence in Duplicates Is Associated with Distinct Roles in Plant Evolution.功能分化程度在重复基因中与植物进化中的不同角色相关。

Mol Biol Evol. 2021 Apr 13;38(4):1447-1459. doi: 10.1093/molbev/msaa302.

Characteristics of Plant Essential Genes Allow for within- and between-Species Prediction of Lethal Mutant Phenotypes.植物必需基因的特征有助于在种内和种间预测致死突变体表型。

Plant Cell. 2015 Aug;27(8):2133-47. doi: 10.1105/tpc.15.00051. Epub 2015 Aug 18.

Analysis of functional redundancies within the Arabidopsis TCP transcription factor family.拟南芥 TCP 转录因子家族功能冗余性分析。

J Exp Bot. 2013 Dec;64(18):5673-85. doi: 10.1093/jxb/ert337. Epub 2013 Oct 15.

Tissue-specific profiling reveals transcriptome alterations in Arabidopsis mutants lacking morphological phenotypes.组织特异性分析揭示了缺乏形态学表型的拟南芥突变体中的转录组变化。

Plant Cell. 2013 Sep;25(9):3175-85. doi: 10.1105/tpc.113.115121. Epub 2013 Sep 6.

本文引用的文献

The flowering world: a tale of duplications.繁花世界：复制的故事

Trends Plant Sci. 2009 Dec;14(12):680-8. doi: 10.1016/j.tplants.2009.09.001. Epub 2009 Oct 7.

Co-expression tools for plant biology: opportunities for hypothesis generation and caveats.植物生物学中的共表达工具：假说生成的机会和注意事项。

Plant Cell Environ. 2009 Dec;32(12):1633-51. doi: 10.1111/j.1365-3040.2009.02040.x. Epub 2009 Aug 27.

Plants with double genomes might have had a better chance to survive the Cretaceous-Tertiary extinction event.具有双基因组的植物可能有更好的机会在白垩纪 - 第三纪灭绝事件中存活下来。

Proc Natl Acad Sci U S A. 2009 Apr 7;106(14):5737-42. doi: 10.1073/pnas.0900906106. Epub 2009 Mar 26.

NCBI GEO: archive for high-throughput functional genomic data.NCBI基因表达综合数据库：高通量功能基因组数据存档库。

Nucleic Acids Res. 2009 Jan;37(Database issue):D885-90. doi: 10.1093/nar/gkn764. Epub 2008 Oct 21.

An en masse phenotype and function prediction system for Mus musculus.一种针对小家鼠的整体表型和功能预测系统。

Genome Biol. 2008;9 Suppl 1(Suppl 1):S8. doi: 10.1186/gb-2008-9-s1-s8. Epub 2008 Jun 27.

Pervasive and persistent redundancy among duplicated genes in yeast.酵母中重复基因间普遍且持久的冗余现象。

PLoS Genet. 2008 Jul 4;4(7):e1000113. doi: 10.1371/journal.pgen.1000113.

The gene regulatory logic of transcription factor evolution.转录因子进化的基因调控逻辑。

Trends Ecol Evol. 2008 Jul;23(7):377-85. doi: 10.1016/j.tree.2008.03.006. Epub 2008 May 22.

Cell identity mediates the response of Arabidopsis roots to abiotic stress.细胞身份介导拟南芥根对非生物胁迫的响应。

Science. 2008 May 16;320(5878):942-5. doi: 10.1126/science.1153795. Epub 2008 Apr 24.

The chemical genomic portrait of yeast: uncovering a phenotype for all genes.酵母的化学基因组图谱：揭示所有基因的表型

Science. 2008 Apr 18;320(5874):362-5. doi: 10.1126/science.1150021.

Cell-specific nitrogen responses mediate developmental plasticity.细胞特异性氮反应介导发育可塑性。

Proc Natl Acad Sci U S A. 2008 Jan 15;105(2):803-8. doi: 10.1073/pnas.0709559105. Epub 2008 Jan 7.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用机器学习预测全基因组冗余。

Predicting genome-wide redundancy using machine learning.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献