QTG-Finder：一种基于机器学习的算法，用于优先考虑拟南芥和水稻中数量性状位点的因果基因。

QTG-Finder: A Machine-Learning Based Algorithm To Prioritize Causal Genes of Quantitative Trait Loci in Arabidopsis and Rice.

机构信息

Department of Plant Biology, Carnegie Institution for Science, Stanford, California 94305.

Department of Plant Biology, Carnegie Institution for Science, Stanford, California 94305

出版信息

G3 (Bethesda). 2019 Oct 7;9(10):3129-3138. doi: 10.1534/g3.119.400319.

DOI:10.1534/g3.119.400319

PMID:31358562

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6778793/

Abstract

Linkage mapping is one of the most commonly used methods to identify genetic loci that determine a trait. However, the loci identified by linkage mapping may contain hundreds of candidate genes and require a time-consuming and labor-intensive fine mapping process to find the causal gene controlling the trait. With the availability of a rich assortment of genomic and functional genomic data, it is possible to develop a computational method to facilitate faster identification of causal genes. We developed QTG-Finder, a machine learning based algorithm to prioritize causal genes by ranking genes within a quantitative trait locus (QTL). Two predictive models were trained separately based on known causal genes in Arabidopsis and rice. An independent validation analysis showed that the models could recall about 64% of Arabidopsis and 79% of rice causal genes when the top 20% ranked genes were considered. The top 20% ranked genes can range from 10 to 100 genes, depending on the size of a QTL. The models can prioritize different types of traits though at different efficiency. We also identified several important features of causal genes including paralog copy number, being a transporter, being a transcription factor, and containing SNPs that cause premature stop codon. This work lays the foundation for systematically understanding characteristics of causal genes and establishes a pipeline to predict causal genes based on public data.

摘要

连锁分析是确定决定性状的遗传基因座的最常用方法之一。然而，连锁分析所确定的基因座可能包含数百个候选基因，需要进行耗时且劳动密集型的精细定位过程，以找到控制性状的因果基因。随着丰富的基因组和功能基因组数据的可用性，有可能开发一种计算方法来促进更快地识别因果基因。我们开发了 QTG-Finder，这是一种基于机器学习的算法，通过对数量性状基因座（QTL）内的基因进行排名来优先考虑因果基因。分别基于拟南芥和水稻中的已知因果基因训练了两个预测模型。独立的验证分析表明，当考虑排名前 20%的基因时，这两个模型可以召回约 64%的拟南芥和 79%的水稻因果基因。排名前 20%的基因可以根据 QTL 的大小从 10 到 100 个基因不等。这些模型可以优先考虑不同类型的性状，尽管效率不同。我们还确定了因果基因的几个重要特征，包括基因的同源拷贝数、作为转运蛋白、作为转录因子以及包含导致提前终止密码子的 SNP。这项工作为系统地理解因果基因的特征奠定了基础，并建立了一个基于公共数据预测因果基因的流程。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b40/6778793/cee6a0128a06/3129f1.jpg

相似文献

QTG-Finder: A Machine-Learning Based Algorithm To Prioritize Causal Genes of Quantitative Trait Loci in Arabidopsis and Rice.QTG-Finder：一种基于机器学习的算法，用于优先考虑拟南芥和水稻中数量性状位点的因果基因。

G3 (Bethesda). 2019 Oct 7;9(10):3129-3138. doi: 10.1534/g3.119.400319.

QTG-Finder2: A Generalized Machine-Learning Algorithm for Prioritizing QTL Causal Genes in Plants.QTG-Finder2：一种用于对植物数量性状基因座（QTL）因果基因进行优先级排序的广义机器学习算法。

G3 (Bethesda). 2020 Jul 7;10(7):2411-2421. doi: 10.1534/g3.120.401122.

Prioritizing candidate eQTL causal genes in Arabidopsis using RANDOM FORESTS.利用随机森林优先选择拟南芥中的候选 eQTL 因果基因。

G3 (Bethesda). 2022 Nov 4;12(11). doi: 10.1093/g3journal/jkac255.

Prioritization of candidate genes in QTL regions based on associations between traits and biological processes.基于性状与生物学过程之间的关联对QTL区域中的候选基因进行优先级排序。

BMC Plant Biol. 2014 Dec 10;14:330. doi: 10.1186/s12870-014-0330-3.

RicePilaf: a post-GWAS/QTL dashboard to integrate pangenomic, coexpression, regulatory, epigenomic, ontology, pathway, and text-mining information to provide functional insights into rice QTLs and GWAS loci.稻米煲饭：一个后 GWAS/QTL 仪表盘，用于整合泛基因组、共表达、调控、表观基因组、本体论、通路和文本挖掘信息，为水稻 QTL 和 GWAS 基因座提供功能见解。

Gigascience. 2024 Jan 2;13. doi: 10.1093/gigascience/giae013.

PlantQTL-GE: a database system for identifying candidate genes in rice and Arabidopsis by gene expression and QTL information.PlantQTL-GE：一个通过基因表达和QTL信息鉴定水稻和拟南芥中候选基因的数据库系统。

Nucleic Acids Res. 2007 Jan;35(Database issue):D879-82. doi: 10.1093/nar/gkl814. Epub 2006 Nov 16.

Quantitative trait loci identification and meta-analysis for rice panicle-related traits.水稻穗部相关性状的数量性状基因座鉴定与荟萃分析。

Mol Genet Genomics. 2016 Oct;291(5):1927-40. doi: 10.1007/s00438-016-1227-7. Epub 2016 Jul 5.

Genome wide association mapping for grain shape traits in indica rice.籼稻粒形性状的全基因组关联图谱分析

Planta. 2016 Oct;244(4):819-30. doi: 10.1007/s00425-016-2548-9. Epub 2016 May 19.

Genetic variation and association mapping for 12 agronomic traits in indica rice.籼稻12个农艺性状的遗传变异与关联分析

BMC Genomics. 2015 Dec 16;16:1067. doi: 10.1186/s12864-015-2245-2.

QTG-Seq Accelerates QTL Fine Mapping through QTL Partitioning and Whole-Genome Sequencing of Bulked Segregant Samples.QTG-Seq 通过对混池分离群体进行 QTL 分区和全基因组测序加速 QTL 精细定位。

Mol Plant. 2019 Mar 4;12(3):426-437. doi: 10.1016/j.molp.2018.12.018. Epub 2018 Dec 28.

引用本文的文献

Is a Quantitative Trait Locus That Controls Seed Size.是一个控制种子大小的数量性状基因座。

Int J Mol Sci. 2025 Aug 27;26(17):8310. doi: 10.3390/ijms26178310.

Using supervised machine-learning approaches to understand abiotic stress tolerance and design resilient crops.利用监督式机器学习方法来理解非生物胁迫耐受性并设计抗逆作物。

Philos Trans R Soc Lond B Biol Sci. 2025 May 29;380(1927):20240252. doi: 10.1098/rstb.2024.0252.

Big data and artificial intelligence-aided crop breeding: Progress and prospects.大数据与人工智能辅助作物育种：进展与展望

J Integr Plant Biol. 2025 Mar;67(3):722-739. doi: 10.1111/jipb.13791. Epub 2024 Oct 28.

Integrated Assays of Genome-Wide Association Study, Multi-Omics Co-Localization, and Machine Learning Associated Calcium Signaling Genes with Oilseed Rape Resistance to .全基因组关联研究、多组学共定位和机器学习综合分析与油菜籽抗 . 相关的钙信号基因

Int J Mol Sci. 2024 Jun 25;25(13):6932. doi: 10.3390/ijms25136932.

Integrating machine learning and genome editing for crop improvement.整合机器学习与基因组编辑技术以改良作物。

aBIOTECH. 2024 Feb 29;5(2):262-277. doi: 10.1007/s42994-023-00133-5. eCollection 2024 Jun.

Identification of quantitative trait loci associated with leaf rust resistance in rye by precision mapping.通过精确作图鉴定与黑麦抗叶锈病相关的数量性状位点。

BMC Plant Biol. 2024 Apr 17;24(1):291. doi: 10.1186/s12870-024-04960-6.

Smart breeding approaches in post-genomics era for developing climate-resilient food crops.后基因组时代用于培育气候适应型粮食作物的智能育种方法。

Front Plant Sci. 2022 Sep 16;13:972164. doi: 10.3389/fpls.2022.972164. eCollection 2022.

Prioritizing candidate eQTL causal genes in Arabidopsis using RANDOM FORESTS.利用随机森林优先选择拟南芥中的候选 eQTL 因果基因。

G3 (Bethesda). 2022 Nov 4;12(11). doi: 10.1093/g3journal/jkac255.

Genome-wide association study and gene network analyses reveal potential candidate genes for high night temperature tolerance in rice.全基因组关联研究和基因网络分析揭示了水稻耐高温夜间的潜在候选基因。

Sci Rep. 2021 Mar 24;11(1):6747. doi: 10.1038/s41598-021-85921-z.

Machine learning in plant science and plant breeding.植物科学与植物育种中的机器学习

iScience. 2020 Dec 5;24(1):101890. doi: 10.1016/j.isci.2020.101890. eCollection 2021 Jan 22.

本文引用的文献

Camoco: A Net for the Sea of Candidate Genes.Camoco：候选基因之海的网络

Plant Cell. 2018 Dec;30(12):2889. doi: 10.1105/tpc.18.00908. Epub 2018 Dec 3.

Integrating Coexpression Networks with GWAS to Prioritize Causal Genes in Maize.整合共表达网络与 GWAS 以优先考虑玉米中的因果基因。

Plant Cell. 2018 Dec;30(12):2922-2942. doi: 10.1105/tpc.18.00299. Epub 2018 Nov 9.

Translating High-Throughput Phenotyping into Genetic Gain.高通量表型分析转化为遗传增益。

Trends Plant Sci. 2018 May;23(5):451-466. doi: 10.1016/j.tplants.2018.02.001. Epub 2018 Mar 16.

Pan-genome analysis highlights the extent of genomic variation in cultivated and wild rice.泛基因组分析突出了栽培稻和野生稻基因组变异的程度。

Nat Genet. 2018 Feb;50(2):278-284. doi: 10.1038/s41588-018-0041-z. Epub 2018 Jan 15.

Annotating pathogenic non-coding variants in genic regions.注释基因区域中的致病性非编码变异体。

Nat Commun. 2017 Aug 9;8(1):236. doi: 10.1038/s41467-017-00141-2.

Regional Association Analysis of MetaQTLs Delineates Candidate Grain Size Genes in Rice.MetaQTLs的区域关联分析鉴定水稻籽粒大小候选基因

Front Plant Sci. 2017 May 29;8:807. doi: 10.3389/fpls.2017.00807. eCollection 2017.

Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data.基于功能和群体基因组数据对有害非编码变异进行快速、可扩展的预测。

Nat Genet. 2017 Apr;49(4):618-624. doi: 10.1038/ng.3810. Epub 2017 Mar 13.

Genome-Wide Prediction of Metabolic Enzymes, Pathways, and Gene Clusters in Plants.植物中代谢酶、代谢途径和基因簇的全基因组预测

Plant Physiol. 2017 Apr;173(4):2041-2059. doi: 10.1104/pp.16.01942. Epub 2017 Feb 22.

Rice SNP-seek database update: new SNPs, indels, and queries.水稻SNP-seek数据库更新：新的单核苷酸多态性、插入缺失及查询内容。

Nucleic Acids Res. 2017 Jan 4;45(D1):D1075-D1081. doi: 10.1093/nar/gkw1135. Epub 2016 Nov 29.

M-CAP eliminates a majority of variants of uncertain significance in clinical exomes at high sensitivity.M-CAP 以高灵敏度消除临床外显子组中大多数意义不明的变异。

Nat Genet. 2016 Dec;48(12):1581-1586. doi: 10.1038/ng.3703. Epub 2016 Oct 24.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

QTG-Finder：一种基于机器学习的算法，用于优先考虑拟南芥和水稻中数量性状位点的因果基因。

QTG-Finder: A Machine-Learning Based Algorithm To Prioritize Causal Genes of Quantitative Trait Loci in Arabidopsis and Rice.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献