• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

SCP4ssd:使用自动化机器学习模型的核苷酸序列合成难度预测无服务器平台。

SCP4ssd: A Serverless Platform for Nucleotide Sequence Synthesis Difficulty Prediction Using an AutoML Model.

机构信息

College of Biotechnology, Tianjin University of Science & Technology, Tianjin 300308, China.

Biodesign Center, Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China.

出版信息

Genes (Basel). 2023 Feb 28;14(3):605. doi: 10.3390/genes14030605.

DOI:10.3390/genes14030605
PMID:36980878
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10048150/
Abstract

DNA synthesis is widely used in synthetic biology to construct and assemble sequences ranging from short RBS to ultra-long synthetic genomes. Many sequence features, such as the GC content and repeat sequences, are known to affect the synthesis difficulty and subsequently the synthesis cost. In addition, there are latent sequence features, especially local characteristics of the sequence, which might affect the DNA synthesis process as well. Reliable prediction of the synthesis difficulty for a given sequence is important for reducing the cost, but this remains a challenge. In this study, we propose a new automated machine learning (AutoML) approach to predict the DNA synthesis difficulty, which achieves an F1 score of 0.930 and outperforms the current state-of-the-art model. We found local sequence features that were neglected in previous methods, which might also affect the difficulty of DNA synthesis. Moreover, experimental validation based on ten genes of strain MG1655 shows that our model can achieve an 80% accuracy, which is also better than the state of art. Moreover, we developed the cloud platform SCP4SSD using an entirely cloud-based serverless architecture for the convenience of the end users.

摘要

DNA 合成广泛应用于合成生物学中,用于构建和组装从短 RBS 到超长合成基因组的序列。许多序列特征,如 GC 含量和重复序列,已知会影响合成难度,进而影响合成成本。此外,还有潜在的序列特征,特别是序列的局部特征,也可能会影响 DNA 合成过程。可靠地预测给定序列的合成难度对于降低成本很重要,但这仍然是一个挑战。在本研究中,我们提出了一种新的自动化机器学习(AutoML)方法来预测 DNA 合成的难度,该方法的 F1 得分为 0.930,优于当前的最先进模型。我们发现了以前方法中忽略的局部序列特征,这些特征也可能影响 DNA 合成的难度。此外,基于 菌株 MG1655 的十个基因的实验验证表明,我们的模型可以达到 80%的准确率,也优于最先进的水平。此外,我们开发了基于云的无服务器架构的云平台 SCP4SSD,方便最终用户使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ae7/10048150/df3348d712c3/genes-14-00605-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ae7/10048150/10d17cdb12c8/genes-14-00605-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ae7/10048150/101244d44baa/genes-14-00605-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ae7/10048150/6ca220bc0926/genes-14-00605-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ae7/10048150/0f6d90fd2a27/genes-14-00605-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ae7/10048150/df3348d712c3/genes-14-00605-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ae7/10048150/10d17cdb12c8/genes-14-00605-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ae7/10048150/101244d44baa/genes-14-00605-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ae7/10048150/6ca220bc0926/genes-14-00605-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ae7/10048150/0f6d90fd2a27/genes-14-00605-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ae7/10048150/df3348d712c3/genes-14-00605-g005.jpg

相似文献

1
SCP4ssd: A Serverless Platform for Nucleotide Sequence Synthesis Difficulty Prediction Using an AutoML Model.SCP4ssd:使用自动化机器学习模型的核苷酸序列合成难度预测无服务器平台。
Genes (Basel). 2023 Feb 28;14(3):605. doi: 10.3390/genes14030605.
2
BioAutoMATED: An end-to-end automated machine learning tool for explanation and design of biological sequences.BioAutoMATED:一个用于解释和设计生物序列的端到端自动化机器学习工具。
Cell Syst. 2023 Jun 21;14(6):525-542.e9. doi: 10.1016/j.cels.2023.05.007.
3
Synthesis Success Calculator: Predicting the Rapid Synthesis of DNA Fragments with Machine Learning.合成成功率计算器:用机器学习预测 DNA 片段的快速合成。
ACS Synth Biol. 2020 Jul 17;9(7):1563-1571. doi: 10.1021/acssynbio.9b00460. Epub 2020 Jun 30.
4
Evaluation of the performance of traditional machine learning algorithms, convolutional neural network and AutoML Vision in ultrasound breast lesions classification: a comparative study.传统机器学习算法、卷积神经网络和自动机器学习视觉在超声乳腺病变分类中的性能评估:一项比较研究。
Quant Imaging Med Surg. 2021 Apr;11(4):1381-1393. doi: 10.21037/qims-20-922.
5
Using Automated Machine Learning to Predict the Mortality of Patients With COVID-19: Prediction Model Development Study.利用自动化机器学习预测 COVID-19 患者的死亡率:预测模型开发研究。
J Med Internet Res. 2021 Feb 26;23(2):e23458. doi: 10.2196/23458.
6
Serverless Prediction of Peptide Properties with Recurrent Neural Networks.基于递归神经网络的肽性质无服务器预测。
J Chem Inf Model. 2023 Apr 24;63(8):2546-2553. doi: 10.1021/acs.jcim.2c01317. Epub 2023 Apr 3.
7
Automated machine learning: Review of the state-of-the-art and opportunities for healthcare.自动化机器学习:最新技术综述及医疗保健领域的机遇
Artif Intell Med. 2020 Apr;104:101822. doi: 10.1016/j.artmed.2020.101822. Epub 2020 Feb 21.
8
An integrative machine learning strategy for improved prediction of essential genes in Escherichia coli metabolism using flux-coupled features.一种利用通量耦合特征改进大肠杆菌代谢中必需基因预测的综合机器学习策略。
Mol Biosyst. 2017 Jul 25;13(8):1584-1596. doi: 10.1039/c7mb00234c.
9
Machine Learning Models for Slope Stability Classification of Circular Mode Failure: An Updated Database and Automated Machine Learning (AutoML) Approach.机器学习模型在圆形破坏模式边坡稳定性分类中的应用:一个更新的数据库和自动化机器学习(AutoML)方法。
Sensors (Basel). 2022 Nov 25;22(23):9166. doi: 10.3390/s22239166.
10
Automated machine learning in nanotoxicity assessment: A comparative study of predictive model performance.纳米毒性评估中的自动化机器学习:预测模型性能的比较研究
Comput Struct Biotechnol J. 2024 Feb 9;25:9-19. doi: 10.1016/j.csbj.2024.02.003. eCollection 2024 Dec.

本文引用的文献

1
Various vaccine platforms in the field of COVID-19.新冠疫情领域的各种疫苗平台。
Beni Suef Univ J Basic Appl Sci. 2022;11(1):35. doi: 10.1186/s43088-022-00215-1. Epub 2022 Mar 7.
2
MathFeature: feature extraction package for DNA, RNA and protein sequences based on mathematical descriptors.MathFeature:基于数学描述符的 DNA、RNA 和蛋白质序列特征提取包。
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab434.
3
mRNA vaccines for COVID-19: what, why and how.用于新冠肺炎的信使核糖核酸疫苗:是什么、为何以及如何发挥作用
Int J Biol Sci. 2021 Apr 10;17(6):1446-1460. doi: 10.7150/ijbs.59233. eCollection 2021.
4
Pharm-AutoML: An open-source, end-to-end automated machine learning package for clinical outcome prediction.Pharm-AutoML:一个用于临床结果预测的开源端到端自动化机器学习工具包。
CPT Pharmacometrics Syst Pharmacol. 2021 May;10(5):478-488. doi: 10.1002/psp4.12621. Epub 2021 May 2.
5
Feature extraction approaches for biological sequences: a comparative study of mathematical features.生物序列的特征提取方法:数学特征的比较研究。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab011.
6
Synthesis Success Calculator: Predicting the Rapid Synthesis of DNA Fragments with Machine Learning.合成成功率计算器:用机器学习预测 DNA 片段的快速合成。
ACS Synth Biol. 2020 Jul 17;9(7):1563-1571. doi: 10.1021/acssynbio.9b00460. Epub 2020 Jun 30.
7
A review of mathematical representations of biomolecular data.生物分子数据的数学表示方法综述。
Phys Chem Chem Phys. 2020 Feb 26;22(8):4343-4367. doi: 10.1039/c9cp06554g.
8
Tunnel engineering to accelerate product release for better biomass-degrading abilities in lignocellulolytic enzymes.通过隧道工程加速产品释放,以提高木质纤维素酶的生物质降解能力。
Biotechnol Biofuels. 2019 Nov 23;12:275. doi: 10.1186/s13068-019-1616-3. eCollection 2019.
9
iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data.iLearn:一个集成平台和元学习者,用于 DNA、RNA 和蛋白质序列数据的特征工程、机器学习分析和建模。
Brief Bioinform. 2020 May 21;21(3):1047-1057. doi: 10.1093/bib/bbz041.
10
CRISPR-assisted multi-dimensional regulation for fine-tuning gene expression in Bacillus subtilis.CRISPR 辅助的多维调控在枯草芽孢杆菌中精细调节基因表达。
Nucleic Acids Res. 2019 Apr 23;47(7):e40. doi: 10.1093/nar/gkz072.