基于第二代测序技术的肝癌基因序列的大数据分析与应用。

Big Data Analysis and Application of Liver Cancer Gene Sequence Based on Second-Generation Sequencing Technology.

机构信息

Faculty of Hepato-Biliary-Pancreatic Surgery, Chinese People's Liberation Army (PLA) General Hospital, Beijing 100853, China.

Faculty of Hepatology Medicine, Chinese People's Liberation Army (PLA) General Hospital, Beijing 100039, China.

出版信息

Comput Math Methods Med. 2022 Aug 16;2022:4004130. doi: 10.1155/2022/4004130. eCollection 2022.

DOI:10.1155/2022/4004130

PMID:36017150

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9398858/

Abstract

In big data analysis with the rapid improvement of computer storage capacity and the rapid development of complex algorithms, the exponential growth of massive data has also made science and technology progress with each passing day. Based on omics data such as mRNA data, microRNA data, or DNA methylation data, this study uses traditional clustering methods such as kmeans, K-nearest neighbors, hierarchical clustering, affinity propagation, and nonnegative matrix decomposition to classify samples into categories, obtained: (1) The assumption that the attributes are independent of each other reduces the classification effect of the algorithm to a certain extent. According to the idea of multilevel grid, there is a one-to-one mapping from high-dimensional space to one-dimensional. The complexity is greatly simplified by encoding the one-dimensional grid of the hierarchical grid. The logic of the algorithm is relatively simple, and it also has a very stable classification efficiency. (2) Convert the two-dimensional representation of the data into the one-dimensional representation of the binary, realize the dimensionality reduction processing of the data, and improve the organization and storage efficiency of the data. The grid coding expresses the spatial position of the data, maintains the original organization method of the data, and does not make the abstract expression of the data object. (3) The data processing of nondiscrete and missing values provides a new opportunity for the identification of protein targets of small molecule therapy and obtains a better classification effect. (4) The comparison of the three models shows that Naive Bayes is the optimal model. Each iteration is composed of alternately expected steps and maximal steps and then identified and quantified by MS.

摘要

在大数据分析中，随着计算机存储容量的快速提高和复杂算法的快速发展，海量数据的指数级增长也使科学技术日新月异。本研究基于 mRNA 数据、microRNA 数据或 DNA 甲基化数据等组学数据，采用传统聚类方法，如 kmeans、K-最近邻、层次聚类、亲和传播和非负矩阵分解等，将样本分类为不同类别，得到：(1) 属性相互独立的假设在一定程度上降低了算法的分类效果。根据多级网格的思想，高维空间与一维空间之间存在一一映射。通过对层次网格的一维网格进行编码，大大简化了算法的复杂度。算法的逻辑比较简单，分类效率也非常稳定。(2) 将数据的二维表示转换为二进制的一维表示，实现数据的降维处理，提高数据的组织和存储效率。网格编码表达了数据的空间位置，保持了数据的原始组织方式，没有对数据对象进行抽象表达。(3) 对非离散和缺失值的数据处理为小分子治疗的蛋白质靶标的识别提供了新的机会，并获得了更好的分类效果。(4) 三种模型的比较表明，朴素贝叶斯是最优模型。每个迭代由交替的期望步骤和最大步骤组成，然后通过 MS 进行识别和量化。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4796/9398858/0e10e12a2cdb/CMMM2022-4004130.001.jpg

相似文献

Big Data Analysis and Application of Liver Cancer Gene Sequence Based on Second-Generation Sequencing Technology.基于第二代测序技术的肝癌基因序列的大数据分析与应用。

Comput Math Methods Med. 2022 Aug 16;2022:4004130. doi: 10.1155/2022/4004130. eCollection 2022.

A novel approach for clustering proteomics data using Bayesian fast Fourier transform.一种使用贝叶斯快速傅里叶变换对蛋白质组学数据进行聚类的新方法。

Bioinformatics. 2005 May 15;21(10):2210-24. doi: 10.1093/bioinformatics/bti383. Epub 2005 Mar 15.

The role of clustering algorithm-based big data processing in information economy development.基于聚类算法的大数据处理在信息经济发展中的作用。

PLoS One. 2021 Mar 11;16(3):e0246718. doi: 10.1371/journal.pone.0246718. eCollection 2021.

Upper-Limb Motion Recognition Based on Hybrid Feature Selection: Algorithm Development and Validation.基于混合特征选择的上肢运动识别：算法开发与验证。

JMIR Mhealth Uhealth. 2021 Sep 2;9(9):e24402. doi: 10.2196/24402.

A PID-Based kNN Query Processing Algorithm for Spatial Data.一种基于PID的空间数据kNN查询处理算法

Sensors (Basel). 2022 Oct 9;22(19):7651. doi: 10.3390/s22197651.

A Fast Exact k-Nearest Neighbors Algorithm for High Dimensional Search Using k-Means Clustering and Triangle Inequality.一种使用k均值聚类和三角不等式进行高维搜索的快速精确k近邻算法。

Proc Int Jt Conf Neural Netw. 2012 Feb 8;43(6):2351-2358. doi: 10.1016/j.patcog.2010.01.003.

Hierarchical trie packet classification algorithm based on expectation-maximization clustering.基于期望最大化聚类的层次化前缀树分组分类算法

PLoS One. 2017 Jul 13;12(7):e0181049. doi: 10.1371/journal.pone.0181049. eCollection 2017.

[Discussion of naive Bayesian algorithm in prognosis prediction of primary liver cancer].[朴素贝叶斯算法在原发性肝癌预后预测中的探讨]

Space Med Med Eng (Beijing). 2004 Oct;17(5):350-4.

ESPRIT-Forest: Parallel clustering of massive amplicon sequence data in subquadratic time.ESPRIT-Forest：在亚二次时间内对海量扩增子序列数据进行并行聚类

PLoS Comput Biol. 2017 Apr 24;13(4):e1005518. doi: 10.1371/journal.pcbi.1005518. eCollection 2017 Apr.

Research and Application of Clustering Algorithm for Text Big Data.文本大数据聚类算法的研究与应用

Comput Intell Neurosci. 2022 Jun 8;2022:7042778. doi: 10.1155/2022/7042778. eCollection 2022.

引用本文的文献

Prognostic risk model under the immune-associated long chain non-coding ribonucleic acid and its application in survival prognosis assessment of patients with breast cancer.免疫相关长链非编码 RNA 预后风险模型及其在乳腺癌患者生存预后评估中的应用。

Sci Rep. 2024 Aug 15;14(1):18928. doi: 10.1038/s41598-024-65614-z.

Clinical application of whole exome sequencing technology in small-for-gestational-age children.全外显子测序技术在小于胎龄儿中的临床应用

Am J Transl Res. 2023 Dec 15;15(12):6813-6822. eCollection 2023.

本文引用的文献

Assessing exposure to Kilkari: a big data analysis of a large maternal mobile messaging service across 13 states in India.评估基尔卡里（Kilkari）的暴露情况：对印度 13 个邦的大型孕产妇移动信息服务的大数据分析。

BMJ Glob Health. 2021 Jul;6(Suppl 5). doi: 10.1136/bmjgh-2021-005213.

Big Data Analysis of the Risk of Intracranial Hemorrhage in Korean Populations Taking Low-Dose Aspirin.大数据分析韩国低剂量服用阿司匹林人群颅内出血的风险。

J Stroke Cerebrovasc Dis. 2021 Aug;30(8):105917. doi: 10.1016/j.jstrokecerebrovasdis.2021.105917. Epub 2021 Jun 11.

Multiomic Big Data Analysis Challenges: Increasing Confidence in the Interpretation of Artificial Intelligence Assessments.多组学大数据分析挑战：提高对人工智能评估解读的信心。

Anal Chem. 2021 Jun 8;93(22):7763-7773. doi: 10.1021/acs.analchem.0c04850. Epub 2021 May 24.

Treatment of liver cancer cells with ethyl acetate extract of Crithmum maritimum permits reducing sorafenib dose and toxicity maintaining its efficacy.用海蓬子乙酸乙酯提取物处理肝癌细胞，可以减少索拉非尼的剂量和毒性，同时保持其疗效。

J Pharm Pharmacol. 2021 Sep 7;73(10):1369-1376. doi: 10.1093/jpp/rgab070.

Routine use of natriuretic peptides: Lessons from a big data analysis.常规使用利钠肽：大数据分析得出的经验。

Ann Clin Biochem. 2021 Sep;58(5):481-486. doi: 10.1177/00045632211020779. Epub 2021 Jun 2.

Tailoring Supramolecular Prodrug Nanoassemblies for Reactive Nitrogen Species-Potentiated Chemotherapy of Liver Cancer.定制用于活性氮物种增强的肝癌化学疗法的超分子前药纳米组装体。

ACS Nano. 2021 May 25;15(5):8663-8675. doi: 10.1021/acsnano.1c00698. Epub 2021 Apr 30.

An Increase in the Levels of Middle Surface Antigen Characterizes Patients Developing HBV-Driven Liver Cancer Despite Prolonged Virological Suppression.尽管病毒学得到长期抑制，但中表面抗原水平升高是发展为乙肝病毒驱动肝癌患者的特征。

Microorganisms. 2021 Apr 2;9(4):752. doi: 10.3390/microorganisms9040752.

O-GlcNAcylation enhances sensitivity to RSL3-induced ferroptosis via the YAP/TFRC pathway in liver cancer.O-连接的N-乙酰葡糖胺化通过YAP/转铁蛋白受体途径增强肝癌细胞对RSL3诱导的铁死亡的敏感性。

Cell Death Discov. 2021 Apr 16;7(1):83. doi: 10.1038/s41420-021-00468-2.

γ‑irradiated prednisolone promotes apoptosis of liver cancer cells via activation of intrinsic apoptosis signaling pathway.γ-辐照泼尼松龙通过激活内在凋亡信号通路促进肝癌细胞凋亡。

Mol Med Rep. 2021 Jun;23(6). doi: 10.3892/mmr.2021.12064. Epub 2021 Apr 13.

Integrative analysis of long extracellular RNAs reveals a detection panel of noncoding RNAs for liver cancer.长细胞外 RNA 的综合分析揭示了用于肝癌的非编码 RNA 检测面板。

Theranostics. 2021 Jan 1;11(1):181-193. doi: 10.7150/thno.48206. eCollection 2021.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于第二代测序技术的肝癌基因序列的大数据分析与应用。

Big Data Analysis and Application of Liver Cancer Gene Sequence Based on Second-Generation Sequencing Technology.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献