• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于合成生物学生物砖可视化的非线性降维方法

Nonlinear dimensionality reduction methods for synthetic biology biobricks' visualization.

作者信息

Yang Jiaoyun, Wang Haipeng, Ding Huitong, An Ning, Alterovitz Gil

机构信息

School of Computer and Information, Hefei University of Technology, Tunxi Road, Hefei, 230009, China.

Harvard Medical School, Boston Children's Hospital, Boston, 02115, MA, USA.

出版信息

BMC Bioinformatics. 2017 Jan 19;18(1):47. doi: 10.1186/s12859-017-1484-4.

DOI:10.1186/s12859-017-1484-4
PMID:28103789
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5248484/
Abstract

BACKGROUND

Visualizing data by dimensionality reduction is an important strategy in Bioinformatics, which could help to discover hidden data properties and detect data quality issues, e.g. data noise, inappropriately labeled data, etc. As crowdsourcing-based synthetic biology databases face similar data quality issues, we propose to visualize biobricks to tackle them. However, existing dimensionality reduction methods could not be directly applied on biobricks datasets. Hereby, we use normalized edit distance to enhance dimensionality reduction methods, including Isomap and Laplacian Eigenmaps.

RESULTS

By extracting biobricks from synthetic biology database Registry of Standard Biological Parts, six combinations of various types of biobricks are tested. The visualization graphs illustrate discriminated biobricks and inappropriately labeled biobricks. Clustering algorithm K-means is adopted to quantify the reduction results. The average clustering accuracy for Isomap and Laplacian Eigenmaps are 0.857 and 0.844, respectively. Besides, Laplacian Eigenmaps is 5 times faster than Isomap, and its visualization graph is more concentrated to discriminate biobricks.

CONCLUSIONS

By combining normalized edit distance with Isomap and Laplacian Eigenmaps, synthetic biology biobircks are successfully visualized in two dimensional space. Various types of biobricks could be discriminated and inappropriately labeled biobricks could be determined, which could help to assess crowdsourcing-based synthetic biology databases' quality, and make biobricks selection.

摘要

背景

通过降维来可视化数据是生物信息学中的一项重要策略,它有助于发现隐藏的数据属性并检测数据质量问题,例如数据噪声、标注不当的数据等。由于基于众包的合成生物学数据库面临类似的数据质量问题,我们建议通过可视化生物模块来解决这些问题。然而,现有的降维方法不能直接应用于生物模块数据集。因此,我们使用归一化编辑距离来增强包括等距映射(Isomap)和拉普拉斯特征映射(Laplacian Eigenmaps)在内的降维方法。

结果

通过从合成生物学数据库标准生物部件登记处提取生物模块,对六种不同类型生物模块的组合进行了测试。可视化图展示了有区别的生物模块和标注不当的生物模块。采用聚类算法K均值来量化降维结果。等距映射和拉普拉斯特征映射的平均聚类准确率分别为0.857和0.844。此外,拉普拉斯特征映射比等距映射快5倍,并且其可视化图在区分生物模块方面更加集中。

结论

通过将归一化编辑距离与等距映射和拉普拉斯特征映射相结合,合成生物学的生物模块成功地在二维空间中实现了可视化。可以区分各种类型的生物模块,并确定标注不当的生物模块,这有助于评估基于众包的合成生物学数据库的质量,并进行生物模块的选择。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e92/5248484/bebdfd0dea78/12859_2017_1484_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e92/5248484/afbc65991896/12859_2017_1484_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e92/5248484/13157f99ba64/12859_2017_1484_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e92/5248484/dbcfeb02882e/12859_2017_1484_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e92/5248484/99cfe77b2369/12859_2017_1484_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e92/5248484/4abfa3b1bd70/12859_2017_1484_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e92/5248484/bebdfd0dea78/12859_2017_1484_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e92/5248484/afbc65991896/12859_2017_1484_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e92/5248484/13157f99ba64/12859_2017_1484_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e92/5248484/dbcfeb02882e/12859_2017_1484_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e92/5248484/99cfe77b2369/12859_2017_1484_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e92/5248484/4abfa3b1bd70/12859_2017_1484_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e92/5248484/bebdfd0dea78/12859_2017_1484_Fig6_HTML.jpg

相似文献

1
Nonlinear dimensionality reduction methods for synthetic biology biobricks' visualization.用于合成生物学生物砖可视化的非线性降维方法
BMC Bioinformatics. 2017 Jan 19;18(1):47. doi: 10.1186/s12859-017-1484-4.
2
Nonlinear dimensionality reduction for visualizing toxicity data: distance-based versus topology-based approaches.用于可视化毒性数据的非线性降维:基于距离与基于拓扑的方法
ChemMedChem. 2014 May;9(5):1047-59. doi: 10.1002/cmdc.201400027. Epub 2014 Apr 11.
3
Sample phenotype clusters in high-density oligonucleotide microarray data sets are revealed using Isomap, a nonlinear algorithm.使用非线性算法等距映射(Isomap)揭示了高密度寡核苷酸微阵列数据集中的样本表型簇。
BMC Bioinformatics. 2005 Aug 2;6:195. doi: 10.1186/1471-2105-6-195.
4
M-Isomap: Orthogonal Constrained Marginal Isomap for Nonlinear Dimensionality Reduction.M-Isomap:用于非线性降维的正交约束边际等距映射。
IEEE Trans Cybern. 2013 Feb;43(1):180-91. doi: 10.1109/TSMCB.2012.2202901. Epub 2012 Jul 3.
5
Mining the structural knowledge of high-dimensional medical data using isomap.使用等距映射挖掘高维医学数据的结构知识。
Med Biol Eng Comput. 2005 May;43(3):410-2. doi: 10.1007/BF02345820.
6
Nonlinear dimensionality reduction of gene expression data for visualization and clustering analysis of cancer tissue samples.基因表达数据的非线性维数降低,用于癌症组织样本的可视化和聚类分析。
Comput Biol Med. 2010 Aug;40(8):723-32. doi: 10.1016/j.compbiomed.2010.06.007. Epub 2010 Jul 16.
7
Manifold Learning in MR spectroscopy using nonlinear dimensionality reduction and unsupervised clustering.利用非线性降维和无监督聚类的磁共振波谱中的流形学习
Magn Reson Med. 2015 Sep;74(3):868-78. doi: 10.1002/mrm.25447. Epub 2014 Sep 8.
8
Non-linear dimensionality reduction of signaling networks.信号网络的非线性降维
BMC Syst Biol. 2007 Jun 8;1:27. doi: 10.1186/1752-0509-1-27.
9
SpectralNET--an application for spectral graph analysis and visualization.SpectralNET——一款用于谱图分析与可视化的应用程序。
BMC Bioinformatics. 2005 Oct 19;6:260. doi: 10.1186/1471-2105-6-260.
10
Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data.非监督降维技术在微阵列基因表达数据可视化中的比较研究。
BMC Bioinformatics. 2010 Nov 18;11:567. doi: 10.1186/1471-2105-11-567.

引用本文的文献

1
How Do Machines Learn? Artificial Intelligence as a New Era in Medicine.机器如何学习?人工智能作为医学的新时代。
J Pers Med. 2021 Jan 7;11(1):32. doi: 10.3390/jpm11010032.
2
Performance comparison of dimensionality reduction methods on RNA-Seq data from the GTEx project.基于 GTEx 项目 RNA-Seq 数据的降维方法性能比较。
Genes Genomics. 2020 Feb;42(2):225-234. doi: 10.1007/s13258-019-00896-6. Epub 2019 Dec 12.

本文引用的文献

1
A survey on filter techniques for feature selection in gene expression microarray analysis.基因表达微阵列分析中特征选择的过滤技术调查。
IEEE/ACM Trans Comput Biol Bioinform. 2012 Jul-Aug;9(4):1106-19. doi: 10.1109/TCBB.2012.33.
2
Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data.非监督降维技术在微阵列基因表达数据可视化中的比较研究。
BMC Bioinformatics. 2010 Nov 18;11:567. doi: 10.1186/1471-2105-11-567.
3
Building outside of the box: iGEM and the BioBricks Foundation.
跳出框框搞建设:国际基因工程机器大赛与生物砖基金会。
Nat Biotechnol. 2009 Dec;27(12):1099-102. doi: 10.1038/nbt1209-1099.
4
A review of feature selection techniques in bioinformatics.生物信息学中特征选择技术综述。
Bioinformatics. 2007 Oct 1;23(19):2507-17. doi: 10.1093/bioinformatics/btm344. Epub 2007 Aug 24.
5
Hessian eigenmaps: locally linear embedding techniques for high-dimensional data.黑森特征映射:用于高维数据的局部线性嵌入技术。
Proc Natl Acad Sci U S A. 2003 May 13;100(10):5591-6. doi: 10.1073/pnas.1031596100. Epub 2003 Apr 30.
6
Synthetic biology: challenges ahead.合成生物学:未来的挑战。
Bioinformatics. 2006 Jan 15;22(2):127-8. doi: 10.1093/bioinformatics/btk018.
7
Foundations for engineering biology.工程生物学基础
Nature. 2005 Nov 24;438(7067):449-53. doi: 10.1038/nature04342.
8
Synthetic biology.合成生物学
Nat Rev Genet. 2005 Jul;6(7):533-43. doi: 10.1038/nrg1637.
9
Systematic benchmarking of microarray data classification: assessing the role of non-linearity and dimensionality reduction.微阵列数据分类的系统基准测试:评估非线性和降维的作用。
Bioinformatics. 2004 Nov 22;20(17):3185-95. doi: 10.1093/bioinformatics/bth383. Epub 2004 Jul 1.
10
Principal component analysis for clustering gene expression data.用于聚类基因表达数据的主成分分析。
Bioinformatics. 2001 Sep;17(9):763-74. doi: 10.1093/bioinformatics/17.9.763.