• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

RNA 团簇:一种从 RNA-seq 数据计算遗传距离的方法。

RNA-clique: a method for computing genetic distances from RNA-seq data.

机构信息

Department of Computer Science, University of Kentucky, 329 Rose St, Lexington, KY, 40508, USA.

Department of Plant Pathology, University of Kentucky, 1405 Veterans Dr, Lexington, KY, 40546, USA.

出版信息

BMC Bioinformatics. 2024 Jun 4;25(1):205. doi: 10.1186/s12859-024-05811-9.

DOI:10.1186/s12859-024-05811-9
PMID:38834962
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11149392/
Abstract

BACKGROUND

Although RNA-seq data are traditionally used for quantifying gene expression levels, the same data could be useful in an integrated approach to compute genetic distances as well. Challenges to using mRNA sequences for computing genetic distances include the relatively high conservation of coding sequences and the presence of paralogous and, in some species, homeologous genes.

RESULTS

We developed a new computational method, RNA-clique, for calculating genetic distances using assembled RNA-seq data and assessed the efficacy of the method using biological and simulated data. The method employs reciprocal BLASTn followed by graph-based filtering to ensure that only orthologous genes are compared. Each vertex in the graph constructed for filtering represents a gene in a specific sample under comparison, and an edge connects a pair of vertices if the genes they represent are best matches for each other in their respective samples. The distance computation is a function of the BLAST alignment statistics and the constructed graph and incorporates only those genes that are present in some complete connected component of this graph. As a biological testbed we used RNA-seq data of tall fescue (Lolium arundinaceum), an allohexaploid plant ( ), and bluehead wrasse (Thalassoma bifasciatum), a teleost fish. RNA-clique reliably distinguished individual tall fescue plants by genotype and distinguished bluehead wrasse RNA-seq samples by individual. In tests with simulated RNA-seq data, the ground truth phylogeny was accurately recovered from the computed distances. Moreover, tests of the algorithm parameters indicated that, even with stringent filtering for orthologs, sufficient sequence data were retained for the distance computations. Although comparisons with an alternative method revealed that RNA-clique has relatively high time and memory requirements, the comparisons also showed that RNA-clique's results were at least as reliable as the alternative's for tall fescue data and were much more reliable for the bluehead wrasse data.

CONCLUSION

Results of this work indicate that RNA-clique works well as a way of deriving genetic distances from RNA-seq data, thus providing a methodological integration of functional and genetic diversity studies.

摘要

背景

尽管 RNA-seq 数据传统上用于量化基因表达水平,但相同的数据也可以在综合方法中用于计算遗传距离。使用 mRNA 序列计算遗传距离的挑战包括编码序列的相对高保守性以及同源和基因的存在。

结果

我们开发了一种新的计算方法 RNA-clique,用于使用组装的 RNA-seq 数据计算遗传距离,并使用生物和模拟数据评估该方法的功效。该方法采用相互 BLASTn 随后进行基于图的过滤,以确保仅比较同源基因。用于过滤构建的图中的每个顶点代表比较中特定样本中的基因,如果它们所代表的基因在各自的样本中彼此是最佳匹配,则顶点之间存在边缘。距离计算是 BLAST 对齐统计数据和构建图的函数,并仅包含该图的某些完整连通分量中存在的那些基因。作为生物学测试平台,我们使用了 tall fescue(Lolium arundinaceum)的 RNA-seq 数据,该数据为异源六倍体植物( ),以及蓝头濑鱼(Thalassoma bifasciatum)的 RNA-seq 数据,该数据为硬骨鱼。RNA-clique 可靠地根据基因型区分 tall fescue 植物,并且根据个体区分蓝头濑鱼 RNA-seq 样本。在使用模拟 RNA-seq 数据的测试中,从计算的距离中准确恢复了实际的系统发育。此外,对算法参数的测试表明,即使对同源物进行严格过滤,也保留了足够的序列数据用于距离计算。尽管与替代方法的比较表明 RNA-clique 具有相对较高的时间和内存要求,但比较还表明,对于 tall fescue 数据,RNA-clique 的结果至少与替代方法一样可靠,而对于蓝头濑鱼数据,则要可靠得多。

结论

这项工作的结果表明,RNA-clique 是从 RNA-seq 数据中获取遗传距离的有效方法,从而为功能和遗传多样性研究提供了方法上的整合。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/f03abddafdd7/12859_2024_5811_Fig27_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/5dde499c7b21/12859_2024_5811_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/d97586fc4ecb/12859_2024_5811_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/081d09a99ee3/12859_2024_5811_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/debd3d594d7e/12859_2024_5811_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/a540f9bd87f4/12859_2024_5811_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/76f1c8832333/12859_2024_5811_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/5a607a2f16cb/12859_2024_5811_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/121fa361fe13/12859_2024_5811_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/4c0bc2e96cc7/12859_2024_5811_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/787024bf0e78/12859_2024_5811_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/bff923f9e964/12859_2024_5811_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/0e6190377f4f/12859_2024_5811_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/1daffc545281/12859_2024_5811_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/e78c25d9c75b/12859_2024_5811_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/215fd638a743/12859_2024_5811_Fig15_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/748e28b18bc2/12859_2024_5811_Fig16_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/7969b6f44511/12859_2024_5811_Fig17_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/77bec6615b0e/12859_2024_5811_Fig18_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/d38b7e51eaf4/12859_2024_5811_Fig19_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/cd400e0ec649/12859_2024_5811_Fig20_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/95f1b7b10daf/12859_2024_5811_Fig21_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/29e823766367/12859_2024_5811_Fig22_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/a7f34acd5f48/12859_2024_5811_Fig23_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/808ea9d40d12/12859_2024_5811_Fig24_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/38852a367265/12859_2024_5811_Fig25_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/6454f2f2c55a/12859_2024_5811_Fig26_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/f03abddafdd7/12859_2024_5811_Fig27_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/5dde499c7b21/12859_2024_5811_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/d97586fc4ecb/12859_2024_5811_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/081d09a99ee3/12859_2024_5811_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/debd3d594d7e/12859_2024_5811_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/a540f9bd87f4/12859_2024_5811_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/76f1c8832333/12859_2024_5811_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/5a607a2f16cb/12859_2024_5811_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/121fa361fe13/12859_2024_5811_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/4c0bc2e96cc7/12859_2024_5811_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/787024bf0e78/12859_2024_5811_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/bff923f9e964/12859_2024_5811_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/0e6190377f4f/12859_2024_5811_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/1daffc545281/12859_2024_5811_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/e78c25d9c75b/12859_2024_5811_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/215fd638a743/12859_2024_5811_Fig15_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/748e28b18bc2/12859_2024_5811_Fig16_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/7969b6f44511/12859_2024_5811_Fig17_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/77bec6615b0e/12859_2024_5811_Fig18_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/d38b7e51eaf4/12859_2024_5811_Fig19_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/cd400e0ec649/12859_2024_5811_Fig20_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/95f1b7b10daf/12859_2024_5811_Fig21_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/29e823766367/12859_2024_5811_Fig22_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/a7f34acd5f48/12859_2024_5811_Fig23_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/808ea9d40d12/12859_2024_5811_Fig24_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/38852a367265/12859_2024_5811_Fig25_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/6454f2f2c55a/12859_2024_5811_Fig26_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/88df/11149392/f03abddafdd7/12859_2024_5811_Fig27_HTML.jpg

相似文献

1
RNA-clique: a method for computing genetic distances from RNA-seq data.RNA 团簇:一种从 RNA-seq 数据计算遗传距离的方法。
BMC Bioinformatics. 2024 Jun 4;25(1):205. doi: 10.1186/s12859-024-05811-9.
2
A graph-based algorithm for RNA-seq data normalization.基于图的算法用于 RNA-seq 数据标准化。
PLoS One. 2020 Jan 24;15(1):e0227760. doi: 10.1371/journal.pone.0227760. eCollection 2020.
3
Genome BLAST distance phylogenies inferred from whole plastid and whole mitochondrion genome sequences.基于整个质体和整个线粒体基因组序列推断的基因组BLAST距离系统发育树。
BMC Bioinformatics. 2006 Jul 19;7:350. doi: 10.1186/1471-2105-7-350.
4
Latent cellular analysis robustly reveals subtle diversity in large-scale single-cell RNA-seq data.潜伏细胞分析能稳健地揭示大规模单细胞 RNA-seq 数据中的细微多样性。
Nucleic Acids Res. 2019 Dec 16;47(22):e143. doi: 10.1093/nar/gkz826.
5
Evolutionary history of tall fescue morphotypes inferred from molecular phylogenetics of the Lolium-Festuca species complex.基于 Lolium-Festuca 种复合体的分子系统发育推断高羊茅形态型的进化历史。
BMC Evol Biol. 2010 Oct 12;10:303. doi: 10.1186/1471-2148-10-303.
6
A topology-preserving dimensionality reduction method for single-cell RNA-seq data using graph autoencoder.基于图自动编码器的单细胞 RNA-seq 数据拓扑保持降维方法。
Sci Rep. 2021 Oct 8;11(1):20028. doi: 10.1038/s41598-021-99003-7.
7
Using RNentropy to Detect Significant Variation in Gene Expression Across Multiple RNA-Seq or Single-Cell RNA-Seq Samples.使用 RNentropy 检测多个 RNA-Seq 或单细胞 RNA-Seq 样本中基因表达的显著变化。
Methods Mol Biol. 2021;2284:77-96. doi: 10.1007/978-1-0716-1307-8_6.
8
GE-Impute: graph embedding-based imputation for single-cell RNA-seq data.GE-Impute:基于图嵌入的单细胞 RNA-seq 数据插补。
Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac313.
9
Optimal Gene Filtering for Single-Cell data (OGFSC)-a gene filtering algorithm for single-cell RNA-seq data.单细胞数据最优基因过滤算法(OGFSC)——一种用于单细胞 RNA-seq 数据的基因过滤算法。
Bioinformatics. 2019 Aug 1;35(15):2602-2609. doi: 10.1093/bioinformatics/bty1016.
10
The maximum clique enumeration problem: algorithms, applications, and implementations.最大团枚举问题:算法、应用和实现。
BMC Bioinformatics. 2012 Jun 25;13 Suppl 10(Suppl 10):S5. doi: 10.1186/1471-2105-13-S10-S5.

本文引用的文献

1
Evaluating restriction enzyme selection for reduced representation sequencing in conservation genomics.评估保护基因组学中简化代表性测序的限制性内切酶选择
Mol Ecol Resour. 2025 Jul;25(5):e13865. doi: 10.1111/1755-0998.13865. Epub 2023 Sep 14.
2
The physiology of alternative splicing.可变剪接的生理学
Nat Rev Mol Cell Biol. 2023 Apr;24(4):242-254. doi: 10.1038/s41580-022-00545-z. Epub 2022 Oct 13.
3
Role of Alternative Splicing in Sex Determination in Vertebrates.替代剪接在脊椎动物性别决定中的作用。
Sex Dev. 2021;15(5-6):381-391. doi: 10.1159/000519218. Epub 2021 Sep 28.
4
rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data.rnaSPAdes:一种从头转录组组装程序及其在 RNA-Seq 数据中的应用。
Gigascience. 2019 Sep 1;8(9). doi: 10.1093/gigascience/giz100.
5
Transcriptome Analysis and Differential Expression in Tall Fescue Harboring Different Endophyte Strains in Response to Water Deficit.转录组分析和差异表达在高羊茅中含有不同的内生菌菌株对水分亏缺的响应。
Plant Genome. 2019 Jun;12(2). doi: 10.3835/plantgenome2018.09.0071.
6
Transcriptome response of Lolium arundinaceum to its fungal endophyte Epichloë coenophiala.雀麦内生真菌 Epichloë coenophiala 诱导雀麦转录组的响应。
New Phytol. 2017 Jan;213(1):324-337. doi: 10.1111/nph.14103. Epub 2016 Aug 1.
7
Large-scale transcriptome sequencing reveals novel expression patterns for key sex-related genes in a sex-changing fish.大规模转录组测序揭示了一种性逆转鱼类中关键性别相关基因的新表达模式。
Biol Sex Differ. 2015 Nov 25;6:26. doi: 10.1186/s13293-015-0044-8. eCollection 2015.
8
Cnidaria: fast, reference-free clustering of raw and assembled genome and transcriptome NGS data.刺胞动物门:原始和组装的基因组及转录组二代测序数据的快速、无参考聚类
BMC Bioinformatics. 2015 Nov 2;16:352. doi: 10.1186/s12859-015-0806-7.
9
A fast, lock-free approach for efficient parallel counting of occurrences of k-mers.一种快速、无锁的方法,用于高效并行计数 k-mer 的出现次数。
Bioinformatics. 2011 Mar 15;27(6):764-70. doi: 10.1093/bioinformatics/btr011. Epub 2011 Jan 7.
10
DendroPy: a Python library for phylogenetic computing.DendroPy:一个用于系统发育计算的 Python 库。
Bioinformatics. 2010 Jun 15;26(12):1569-71. doi: 10.1093/bioinformatics/btq228. Epub 2010 Apr 25.