Mohamed Ruwaa I, Ault-Seay Taylor B, Moisá Sonia J, Beever Jonathan E, Ríus Agustín G, Rowan Troy N
Genome Science and Technology Program, Bredesen Center, University of Tennessee, Knoxville, TN, USA.
Department of Animal Science, University of Tennessee Institute of Agriculture (UTIA), Knoxville, TN, USA.
BMC Genomics. 2025 Apr 16;26(1):379. doi: 10.1186/s12864-025-11571-4.
Genetic and genomic selection programs require large numbers of phenotypes observed for animals in shared environments. Direct measurements of phenotypes like meat quality, methane emission, and disease susceptibility are difficult and expensive to measure at scale but are critically important to livestock production. Our work leans on our understanding of the "Central Dogma" of molecular genetics to leverage molecular intermediates as cheaply-measured proxies of organism-level phenotypes. The rapidly declining cost of next-generation sequencing presents opportunities for population-level molecular phenotyping. While the cost of whole transcriptome sequencing has declined recently, its required sequencing depth still makes it an expensive choice for wide-scale molecular phenotyping. We aim to optimize 3' mRNA sequencing (3' mRNA-Seq) approaches for collecting cost-effective proxy molecular phenotypes for cattle from easy-to-collect tissue samples (i.e., whole blood). We used matched 3' mRNA-Seq samples for 15 Holstein male calves in a heat stress trail to identify the (1) best library preparation kit (Takara SMART-Seq v4 3' DE and Lexogen QuantSeq) and (2) optimal sequencing depth (0.5 to 20 million reads/sample) to capture gene expression phenotypes most cost-effectively.
Takara SMART-Seq v4 3' DE outperformed Lexogen QuantSeq libraries across all metrics: number of quality reads, expressed genes, informative genes, differentially expressed genes, and 3' biased intragenic variants. Serial downsampling analyses identified that as few as 8.0 million reads per sample could effectively capture most of the between-sample variation in gene expression. However, progressively more reads did provide marginal increases in recall across metrics. These 3' mRNA-Seq reads can also capture animal genotypes that could be used as the basis for downstream imputation. The 10 million read downsampled groups called an average of 109,700 SNPs and 11,367 INDELs, many of which segregate at moderate minor allele frequencies in the population.
This work demonstrates that 3' mRNA-Seq with Takara SMART-Seq v4 3' DE can provide an incredibly cost-effective (< 25 USD/sample) approach to quantifying molecular phenotypes (gene expression) while discovering sufficient variation for use in genotype imputation. Ongoing work is evaluating the accuracy of imputation and the ability of much larger datasets to predict individual animal phenotypes.
遗传和基因组选择计划需要在共享环境中对大量动物观察到的表型。像肉质、甲烷排放和疾病易感性等表型的直接测量在大规模测量时既困难又昂贵,但对畜牧生产至关重要。我们的工作基于对分子遗传学“中心法则”的理解,利用分子中间体作为生物体水平表型的廉价测量替代物。下一代测序成本的迅速下降为群体水平的分子表型分析带来了机遇。虽然全转录组测序的成本最近有所下降,但其所需的测序深度使其对于大规模分子表型分析而言仍是一个昂贵的选择。我们旨在优化3' mRNA测序(3' mRNA-Seq)方法,以便从易于采集的组织样本(即全血)中为牛收集具有成本效益的替代分子表型。我们在热应激试验中对15头荷斯坦雄性犊牛使用匹配的3' mRNA-Seq样本,以确定(1)最佳文库制备试剂盒(Takara SMART-Seq v4 3' DE和Lexogen QuantSeq)和(2)最佳测序深度(0.5至2000万读数/样本),以最具成本效益地捕获基因表达表型。
在所有指标上,Takara SMART-Seq v4 3' DE文库均优于Lexogen QuantSeq文库:高质量读数数量、表达基因数量、信息基因数量、差异表达基因数量以及3' 偏向基因内变异。连续下采样分析表明,每个样本低至800万读数就能有效捕获基因表达中样本间的大部分变异。然而,逐渐增加的读数确实在各项指标的召回率上带来了边际提升。这些3' mRNA-Seq读数还能捕获可作为下游归因基础的动物基因型。1000万读数的下采样组平均检测到109,700个单核苷酸多态性(SNP)和11,367个插入缺失(INDEL),其中许多在群体中以中等次要等位基因频率分离。
这项工作表明,使用Takara SMART-Seq v4 3' DE进行3' mRNA-Seq可以提供一种成本极低(<25美元/样本)的方法来量化分子表型(基因表达),同时发现足够的变异用于基因型归因。正在进行的工作正在评估归因的准确性以及更大数据集预测个体动物表型的能力。