• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于聚类与分类方法综合运用的基因表达谱提取技术

Technique of Gene Expression Profiles Extraction Based on the Complex Use of Clustering and Classification Methods.

作者信息

Babichev Sergii, Škvor Jiří

机构信息

Department of Informatics, Faculty of Science, Jan Evangelista Purkyně University in Ústí nad Labem, 40096 Ústí nad Labem, Czech Republic.

Department of Computer Science, Software Engineering and Economic Cybernetics, Faculty of Computer Science, Physics and Mathematics, Kherson State University, Kherson 73003, Ukraine.

出版信息

Diagnostics (Basel). 2020 Aug 12;10(8):584. doi: 10.3390/diagnostics10080584.

DOI:10.3390/diagnostics10080584
PMID:32806785
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7460566/
Abstract

In this paper, we present the results of the research concerning extraction of informative gene expression profiles from high-dimensional array of gene expressions considering the state of patients' health using clustering method, ML-based binary classifiers and fuzzy inference system. Applying of the proposed stepwise procedure can allow us to extract the most informative genes taking into account both the subtypes of disease or state of the patient's health for further reconstruction of gene regulatory networks based on the allocated genes and following simulation of the reconstructed models. We used the publicly available gene expressions data as the experimental ones which were obtained using DNA microarray experiments and contained two types of patients' gene expression profiles-the patients with lung cancer tumor and healthy patients. The stepwise procedure of the data processing assumes the following steps-in the beginning, we reduce the number of genes by removing non-informative genes in terms of statistical criteria and Shannon entropy; then, we perform the stepwise hierarchical clustering of gene expression profiles at hierarchical levels from 1 to 10 using the SOTA (Self-Organizing Tree Algorithm) clustering algorithm with correlation distance metric. The quality of the obtained clustering was evaluated using the complex clustering quality criterion which is considered both the gene expression profiles distribution relative to center of the clusters where these gene expression profiles are allocated and the centers of the clusters distribution. The result of this stage execution was a selection of the optimal cluster at each of the hierarchical levels which corresponded to the minimum value of the quality criterion. At the next step, we have implemented a classification procedure of the examined objects using four well known binary classifiers-logistic regression, support-vector machine, decision trees and random forest classifier. The effectiveness of the appropriate technique was evaluated based on the use of ROC (Receiver Operating Characteristic) analysis using criteria, included as the components, the errors of both the first and the second kinds. The final decision concerning the extraction of the most informative subset of gene expression profiles was taken based on the use of the fuzzy inference system, the inputs of which are the results of the appropriate single classifiers operation and the output is the final solution concerning state of the patient's health. To our mind, the implementation of the proposed stepwise procedure of the informative gene expression profiles extraction create the conditions for the increasing effectiveness of the further procedure of gene regulatory networks reconstruction and the following simulation of the reconstructed models considering the subtypes of the disease and/or state of the patient's health.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f41b/7460566/ba39762fdc81/diagnostics-10-00584-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f41b/7460566/dea01c3cdb6e/diagnostics-10-00584-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f41b/7460566/34c7632ea0bb/diagnostics-10-00584-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f41b/7460566/36485ef0f8c7/diagnostics-10-00584-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f41b/7460566/9e9264b26aa9/diagnostics-10-00584-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f41b/7460566/1179d8caf6e1/diagnostics-10-00584-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f41b/7460566/127823dbba90/diagnostics-10-00584-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f41b/7460566/2ee8d50e7c71/diagnostics-10-00584-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f41b/7460566/ba39762fdc81/diagnostics-10-00584-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f41b/7460566/dea01c3cdb6e/diagnostics-10-00584-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f41b/7460566/34c7632ea0bb/diagnostics-10-00584-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f41b/7460566/36485ef0f8c7/diagnostics-10-00584-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f41b/7460566/9e9264b26aa9/diagnostics-10-00584-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f41b/7460566/1179d8caf6e1/diagnostics-10-00584-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f41b/7460566/127823dbba90/diagnostics-10-00584-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f41b/7460566/2ee8d50e7c71/diagnostics-10-00584-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f41b/7460566/ba39762fdc81/diagnostics-10-00584-g010.jpg
摘要

在本文中,我们展示了一项研究成果,该研究涉及使用聚类方法、基于机器学习的二元分类器和模糊推理系统,从考虑患者健康状况的高维基因表达阵列中提取信息丰富的基因表达谱。应用所提出的逐步程序,我们能够在考虑疾病亚型或患者健康状况的同时,提取出最具信息性的基因,以便基于所分配的基因进一步重建基因调控网络,并对重建模型进行后续模拟。我们使用公开可用的基因表达数据作为实验数据,这些数据通过DNA微阵列实验获得,包含两种类型的患者基因表达谱——肺癌肿瘤患者和健康患者。数据处理的逐步程序包括以下步骤:首先,根据统计标准和香农熵去除无信息基因,从而减少基因数量;然后,我们使用具有相关距离度量的SOTA(自组织树算法)聚类算法,对基因表达谱进行从1到10层次的逐步层次聚类。使用综合聚类质量标准评估所获得聚类的质量,该标准既考虑了相对于分配这些基因表达谱的聚类中心的基因表达谱分布,也考虑了聚类中心的分布。此阶段执行的结果是在每个层次级别上选择对应于质量标准最小值的最优聚类。在下一步中,我们使用四个著名的二元分类器——逻辑回归、支持向量机、决策树和随机森林分类器,对被检查对象实施分类程序。基于使用包含第一类和第二类错误作为组成部分的ROC(受试者工作特征)分析,评估相应技术的有效性。关于提取最具信息性的基因表达谱子集的最终决策,是基于使用模糊推理系统做出的,其输入是相应单个分类器操作的结果,输出是关于患者健康状况的最终解决方案。我们认为,实施所提出的信息丰富的基因表达谱提取逐步程序,为提高基因调控网络重建的后续程序以及考虑疾病亚型和/或患者健康状况对重建模型进行后续模拟的有效性创造了条件。

相似文献

1
Technique of Gene Expression Profiles Extraction Based on the Complex Use of Clustering and Classification Methods.基于聚类与分类方法综合运用的基因表达谱提取技术
Diagnostics (Basel). 2020 Aug 12;10(8):584. doi: 10.3390/diagnostics10080584.
2
Hierarchical gene selection and genetic fuzzy system for cancer microarray data classification.用于癌症微阵列数据分类的分层基因选择与遗传模糊系统
PLoS One. 2015 Mar 30;10(3):e0120364. doi: 10.1371/journal.pone.0120364. eCollection 2015.
3
A hierarchical unsupervised growing neural network for clustering gene expression patterns.一种用于聚类基因表达模式的分层无监督生长神经网络。
Bioinformatics. 2001 Feb;17(2):126-36. doi: 10.1093/bioinformatics/17.2.126.
4
Metric for measuring the effectiveness of clustering of DNA microarray expression.用于测量 DNA 微阵列表达聚类有效性的度量。
BMC Bioinformatics. 2006 Sep 6;7 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2105-7-S2-S5.
5
Neural networks and Fuzzy clustering methods for assessing the efficacy of microarray based intrinsic gene signatures in breast cancer classification and the character and relations of identified subtypes.用于评估基于微阵列的内在基因特征在乳腺癌分类中的功效以及所识别亚型的特征和关系的神经网络与模糊聚类方法。
Methods Mol Biol. 2015;1260:285-317. doi: 10.1007/978-1-4939-2239-0_18.
6
Automatic feed phase identification in multivariate bioprocess profiles by sequential binary classification.通过顺序二进制分类对多元生物过程曲线进行自动进料阶段识别。
Anal Chim Acta. 2017 Aug 22;982:48-61. doi: 10.1016/j.aca.2017.05.034. Epub 2017 Jun 22.
7
Fuzzy ensemble clustering based on random projections for DNA microarray data analysis.基于随机投影的模糊集成聚类用于DNA微阵列数据分析
Artif Intell Med. 2009 Feb-Mar;45(2-3):173-83. doi: 10.1016/j.artmed.2008.07.014. Epub 2008 Sep 17.
8
A fuzzy based feature selection from independent component subspace for machine learning classification of microarray data.一种基于模糊的独立成分子空间特征选择方法,用于微阵列数据的机器学习分类。
Genom Data. 2016 Feb 23;8:4-15. doi: 10.1016/j.gdata.2016.02.012. eCollection 2016 Jun.
9
GenSo-FDSS: a neural-fuzzy decision support system for pediatric ALL cancer subtype identification using gene expression data.GenSo-FDSS:一种基于基因表达数据的用于小儿急性淋巴细胞白血病癌症亚型识别的神经模糊决策支持系统。
Artif Intell Med. 2005 Jan;33(1):61-88. doi: 10.1016/j.artmed.2004.03.009.
10
A New Validity Index Based on Fuzzy Energy and Fuzzy Entropy Measures in Fuzzy Clustering Problems.基于模糊聚类问题中模糊能量和模糊熵测度的一种新有效性指标。
Entropy (Basel). 2020 Oct 23;22(11):1200. doi: 10.3390/e22111200.

引用本文的文献

1
Temporal classification of short time series data.短时间序列数据的时间分类。
BMC Bioinformatics. 2024 Jan 17;25(1):30. doi: 10.1186/s12859-024-05636-6.
2
Role of Flt4 in Skin Protection against UVB Radiation: A System Biology Approach.Flt4在皮肤抵御紫外线辐射中的作用:一种系统生物学方法。
J Lasers Med Sci. 2020 Fall;11(Suppl 1):S30-S36. doi: 10.34172/jlms.2020.S5. Epub 2020 Dec 30.

本文引用的文献

1
Dimension Reduction and Clustering Models for Single-Cell RNA Sequencing Data: A Comparative Study.降维与聚类模型在单细胞 RNA 测序数据中的应用:一项比较研究。
Int J Mol Sci. 2020 Mar 22;21(6):2181. doi: 10.3390/ijms21062181.
2
Using Class-Specific Feature Selection for Cancer Detection with Gene Expression Profile Data of Platelets.使用血小板基因表达谱数据的基于类别的特征选择进行癌症检测。
Sensors (Basel). 2020 Mar 10;20(5):1528. doi: 10.3390/s20051528.
3
QUBIC2: a novel and robust biclustering algorithm for analyses and interpretation of large-scale RNA-Seq data.
QUbic2:一种新颖而强大的用于大规模 RNA-Seq 数据分析和解释的双聚类算法。
Bioinformatics. 2020 Feb 15;36(4):1143-1149. doi: 10.1093/bioinformatics/btz692.
4
Biclustering via sparse clustering.基于稀疏聚类的子空间聚类。
Biometrics. 2020 Mar;76(1):348-358. doi: 10.1111/biom.13136. Epub 2019 Oct 14.
5
Algorithmic and Stochastic Representations of Gene Regulatory Networks and Protein-Protein Interactions.基因调控网络和蛋白质-蛋白质相互作用的算法和随机表示。
Curr Top Med Chem. 2019;19(6):413-425. doi: 10.2174/1568026619666190311125256.
6
Computational Modeling and Reverse Engineering to Reveal Dominant Regulatory Interactions Controlling Osteochondral Differentiation: Potential for Regenerative Medicine.通过计算建模和逆向工程揭示控制骨软骨分化的主要调控相互作用:再生医学的潜力
Front Bioeng Biotechnol. 2018 Nov 13;6:165. doi: 10.3389/fbioe.2018.00165. eCollection 2018.
7
Microarray-Based Gene Expression Analysis for Veterinary Pathologists: A Review.基于微阵列的兽医病理学家基因表达分析:综述
Vet Pathol. 2017 Sep;54(5):734-755. doi: 10.1177/0300985817709887. Epub 2017 Jun 23.
8
Logistic Regression: Relating Patient Characteristics to Outcomes.逻辑回归:将患者特征与预后相关联。
JAMA. 2016 Aug 2;316(5):533-4. doi: 10.1001/jama.2016.7653.
9
Classification of gene expression data: A hubness-aware semi-supervised approach.基因表达数据的分类:一种感知中心性的半监督方法。
Comput Methods Programs Biomed. 2016 Apr;127:105-13. doi: 10.1016/j.cmpb.2016.01.016. Epub 2016 Feb 11.
10
The sequence of sequencers: The history of sequencing DNA.测序仪的序列:DNA测序的历史。
Genomics. 2016 Jan;107(1):1-8. doi: 10.1016/j.ygeno.2015.11.003. Epub 2015 Nov 10.