• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

合成生物学家用于提高RNA测序分析可重复性的工具包。

A toolkit for enhanced reproducibility of RNASeq analysis for synthetic biologists.

作者信息

Garcia Benjamin J, Urrutia Joshua, Zheng George, Becker Diveena, Corbet Carolyn, Maschhoff Paul, Cristofaro Alexander, Gaffney Niall, Vaughn Matthew, Saxena Uma, Chen Yi-Pei, Gordon D Benjamin, Eslami Mohammed

机构信息

Department of Biological Engineering, Synthetic Biology Center, Massachusetts Institute of Technology, Cambridge, MA, USA.

Texas Advanced Computing Center, University of Texas at Austin, Austin, TX, USA.

出版信息

Synth Biol (Oxf). 2022 Aug 23;7(1):ysac012. doi: 10.1093/synbio/ysac012. eCollection 2022.

DOI:10.1093/synbio/ysac012
PMID:36035514
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9408027/
Abstract

Sequencing technologies, in particular RNASeq, have become critical tools in the design, build, test and learn cycle of synthetic biology. They provide a better understanding of synthetic designs, and they help identify ways to improve and select designs. While these data are beneficial to design, their collection and analysis is a complex, multistep process that has implications on both discovery and reproducibility of experiments. Additionally, tool parameters, experimental metadata, normalization of data and standardization of file formats present challenges that are computationally intensive. This calls for high-throughput pipelines expressly designed to handle the combinatorial and longitudinal nature of synthetic biology. In this paper, we present a pipeline to maximize the analytical reproducibility of RNASeq for synthetic biologists. We also explore the impact of reproducibility on the validation of machine learning models. We present the design of a pipeline that combines traditional RNASeq data processing tools with structured metadata tracking to allow for the exploration of the combinatorial design in a high-throughput and reproducible manner. We then demonstrate utility via two different experiments: a control comparison experiment and a machine learning model experiment. The first experiment compares datasets collected from identical biological controls across multiple days for two different organisms. It shows that a reproducible experimental protocol for one organism does not guarantee reproducibility in another. The second experiment quantifies the differences in experimental runs from multiple perspectives. It shows that the lack of reproducibility from these different perspectives can place an upper bound on the validation of machine learning models trained on RNASeq data. Graphical Abstract.

摘要

测序技术,尤其是RNA测序(RNAseq),已成为合成生物学设计、构建、测试和学习循环中的关键工具。它们有助于更好地理解合成设计,并有助于确定改进和选择设计的方法。虽然这些数据对设计有益,但其收集和分析是一个复杂的多步骤过程,对实验的发现和可重复性都有影响。此外,工具参数、实验元数据、数据归一化和文件格式标准化带来了计算量很大的挑战。这就需要专门设计的高通量流程来处理合成生物学的组合性和纵向性。在本文中,我们提出了一个流程,以最大限度地提高合成生物学家进行RNA测序分析的可重复性。我们还探讨了可重复性对机器学习模型验证的影响。我们展示了一个将传统RNA测序数据处理工具与结构化元数据跟踪相结合的流程设计,以便能够以高通量和可重复的方式探索组合设计。然后,我们通过两个不同的实验展示其实用性:一个对照比较实验和一个机器学习模型实验。第一个实验比较了在多天内从两种不同生物体的相同生物对照中收集的数据集。结果表明,一种生物体的可重复实验方案并不能保证在另一种生物体中也具有可重复性。第二个实验从多个角度量化了实验运行中的差异。结果表明,这些不同角度缺乏可重复性会对基于RNA测序数据训练的机器学习模型的验证设置上限。图形摘要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e119/9408027/f22b68434d13/ysac012f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e119/9408027/47dd070e653b/ysac012f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e119/9408027/dac4b32e1ca5/ysac012f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e119/9408027/cf458723e414/ysac012f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e119/9408027/ba60aca3de1b/ysac012f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e119/9408027/f22b68434d13/ysac012f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e119/9408027/47dd070e653b/ysac012f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e119/9408027/dac4b32e1ca5/ysac012f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e119/9408027/cf458723e414/ysac012f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e119/9408027/ba60aca3de1b/ysac012f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e119/9408027/f22b68434d13/ysac012f6.jpg

相似文献

1
A toolkit for enhanced reproducibility of RNASeq analysis for synthetic biologists.合成生物学家用于提高RNA测序分析可重复性的工具包。
Synth Biol (Oxf). 2022 Aug 23;7(1):ysac012. doi: 10.1093/synbio/ysac012. eCollection 2022.
2
Robustness and reproducibility of simple and complex synthetic logic circuit designs using a DBTL loop.使用DBTL循环的简单和复杂合成逻辑电路设计的稳健性和可重复性。
Synth Biol (Oxf). 2023 Mar 28;8(1):ysad005. doi: 10.1093/synbio/ysad005. eCollection 2023.
3
Round Trip: An Automated Pipeline for Experimental Design, Execution, and Analysis.往返行程:一个用于实验设计、执行和分析的自动化流程。
ACS Synth Biol. 2022 Feb 18;11(2):608-622. doi: 10.1021/acssynbio.1c00305. Epub 2022 Jan 31.
4
BioWes-from design of experiment, through protocol to repository, control, standardization and back-tracking.生物实验工作流程——从实验设计,到方案制定,再到资源库管理、控制、标准化以及回溯。
Biomed Eng Online. 2016 Jul 15;15 Suppl 1(Suppl 1):74. doi: 10.1186/s12938-016-0188-8.
5
Machine learning random forest for predicting oncosomatic variant NGS analysis.机器学习随机森林预测肿瘤体细胞变异 NGS 分析。
Sci Rep. 2021 Nov 8;11(1):21820. doi: 10.1038/s41598-021-01253-y.
6
Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学:基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍
7
Macromolecular crowding: chemistry and physics meet biology (Ascona, Switzerland, 10-14 June 2012).大分子拥挤现象:化学与物理邂逅生物学(瑞士阿斯科纳,2012年6月10日至14日)
Phys Biol. 2013 Aug;10(4):040301. doi: 10.1088/1478-3975/10/4/040301. Epub 2013 Aug 2.
8
Genomics pipelines and data integration: challenges and opportunities in the research setting.基因组学流程与数据整合:研究环境中的挑战与机遇
Expert Rev Mol Diagn. 2017 Mar;17(3):225-237. doi: 10.1080/14737159.2017.1282822. Epub 2017 Jan 25.
9
Principles of synthetic biology.合成生物学原理。
Essays Biochem. 2021 Nov 2;65(5):791-811. doi: 10.1042/EBC20200059.
10
Role of Digital Microfluidics in Enabling Access to Laboratory Automation and Making Biology Programmable.数字微流控在实现实验室自动化和使生物学可编程中的作用。
SLAS Technol. 2020 Oct;25(5):411-426. doi: 10.1177/2472630320931794. Epub 2020 Jun 25.

引用本文的文献

1
Automated in vivo enzyme engineering accelerates biocatalyst optimization.自动化体内酶工程加速生物催化剂优化。
Nat Commun. 2024 Apr 24;15(1):3447. doi: 10.1038/s41467-024-46574-4.
2
Special issue: reproducibility in synthetic biology.特刊:合成生物学中的可重复性
Synth Biol (Oxf). 2023 Nov 16;8(1):ysad015. doi: 10.1093/synbio/ysad015. eCollection 2023.
3
Advancing reproducibility can ease the 'hard truths' of synthetic biology.提高可重复性能够缓解合成生物学的“残酷现实”。

本文引用的文献

1
Round Trip: An Automated Pipeline for Experimental Design, Execution, and Analysis.往返行程:一个用于实验设计、执行和分析的自动化流程。
ACS Synth Biol. 2022 Feb 18;11(2):608-622. doi: 10.1021/acssynbio.1c00305. Epub 2022 Jan 31.
2
Prediction of whole-cell transcriptional response with machine learning.基于机器学习的全细胞转录反应预测。
Bioinformatics. 2022 Jan 3;38(2):404-409. doi: 10.1093/bioinformatics/btab676.
3
Comparison of Short-Read Sequence Aligners Indicates Strengths and Weaknesses for Biologists to Consider.
Synth Biol (Oxf). 2023 Oct 28;8(1):ysad014. doi: 10.1093/synbio/ysad014. eCollection 2023.
短读长序列比对工具的比较显示了生物学家需要考虑的优势和劣势。
Front Plant Sci. 2021 Apr 16;12:657240. doi: 10.3389/fpls.2021.657240. eCollection 2021.
4
Sharing biological data: why, when, and how.分享生物数据:为什么、何时以及如何分享。
FEBS Lett. 2021 Apr;595(7):847-863. doi: 10.1002/1873-3468.14067.
5
Reproducibility in systems biology modelling.系统生物学建模中的可重复性。
Mol Syst Biol. 2021 Feb;17(2):e9982. doi: 10.15252/msb.20209982.
6
A versatile workflow to integrate RNA-seq genomic and transcriptomic data into mechanistic models of signaling pathways.一种灵活的工作流程,可将 RNA-seq 基因组和转录组数据集成到信号通路的机制模型中。
PLoS Comput Biol. 2021 Feb 11;17(2):e1008748. doi: 10.1371/journal.pcbi.1008748. eCollection 2021 Feb.
7
Systematic comparison and assessment of RNA-seq procedures for gene expression quantitative analysis.系统比较和评估 RNA-seq 程序进行基因表达定量分析。
Sci Rep. 2020 Nov 12;10(1):19737. doi: 10.1038/s41598-020-76881-x.
8
Genetic circuit characterization by inferring RNA polymerase movement and ribosome usage.通过推断 RNA 聚合酶的运动和核糖体的使用来进行遗传回路表征。
Nat Commun. 2020 Oct 5;11(1):5001. doi: 10.1038/s41467-020-18630-2.
9
RNA-Seq Reproducibility Assessment of the Sequencing Quality Control Project.测序质量控制项目的RNA测序可重复性评估
Cancer Inform. 2020 May 20;19:1176935120922498. doi: 10.1177/1176935120922498. eCollection 2020.
10
Application of combinatorial optimization strategies in synthetic biology.组合优化策略在合成生物学中的应用。
Nat Commun. 2020 May 15;11(1):2446. doi: 10.1038/s41467-020-16175-y.