Anna University, Chennai, India.
J Med Syst. 2019 Jul 4;43(8):264. doi: 10.1007/s10916-019-1409-z.
Nowadays, Cancer diagnosis is one of the major challenging characteristics for treating cancer. The reality of cancer patients rely on the diagnosis of cancer at the early stages (either in stage 1 or stage 2). If the cancer is diagnosed in stage 3 or later stages means the changes of survival of the patient will become more critical. Normally, single patient records will generate a huge amount of data if the data could be manage and analyze means to solve many problems for identifying the patterns it will leads to diagnose the cancer. Recent work several machine learning algorithms are introduced for the classification of cancer. However still the classification accuracy of machine learning algorithms are reduced because of huge number of samples. So the proposed work introduces a new Hadoop Distributed File System (HDFS) is focused in this work. In this paper, the proposed phenotype techniques are used which handle and classifies the raw EHR (Electronic Health Record) and EMR (Electronic Medical Record). It is based on the HDFS and Two-Phase Map Reduce. Phenotype algorithm uses NLP (National Language Processing) tool which will analyze and classify the cancer patient data like gene mapping, age related data, image and ultrasonic frequency processing, identification and analysis of irregularities, disease and personal histories. In this paper, the three factorized model is used which calculates the mean score values. The values are calculated by disease stage, pain status, etc. This paper focuses big data analytics for cancer diagnosis and the simulation results shows the proposed system produces the highest performance.
如今,癌症诊断是癌症治疗的主要挑战性特征之一。癌症患者的现实情况依赖于早期(第 1 期或第 2 期)的癌症诊断。如果癌症在第 3 期或更晚阶段被诊断出来,这意味着患者的生存变化将变得更加关键。通常,如果能够管理和分析这些数据,那么单个患者的记录会产生大量的数据,这意味着可以解决许多问题,从而识别出模式,进而诊断癌症。最近已经引入了几种机器学习算法来对癌症进行分类。然而,由于样本数量巨大,机器学习算法的分类准确性仍然会降低。因此,本项工作引入了一种新的 Hadoop 分布式文件系统(HDFS)。在本文中,引入了一种新的 Hadoop 分布式文件系统(HDFS),该系统专注于处理和分类原始的 EHR(电子健康记录)和 EMR(电子病历)。它基于 HDFS 和两阶段 Map Reduce。表型算法使用 NLP(自然语言处理)工具来分析和分类癌症患者的数据,如基因图谱、年龄相关数据、图像和超声波频率处理、不规则性、疾病和个人病史的识别和分析。在本文中,使用了三因子化模型来计算平均值。这些值是通过疾病阶段、疼痛状况等来计算的。本文专注于癌症诊断的大数据分析,模拟结果表明,所提出的系统产生了最高的性能。