一种基于DNA甲基化、利用机器学习和医疗保健语义知识的癌症检测智能分类系统。

An Intelligent Classification System for Cancer Detection Based on DNA Methylation Using ML and Semantic Knowledge in Healthcare.

作者信息

Thakare Anuradha, Bhende Manisha, Tesema Mulugeta, Dighriri Mohammed, Bhavani R, Mahmoud Amena

机构信息

Department of Computer Engineering, Pimpri Chinchwad College of Engineering, Pune, India.

Marathwada Mitra Mandal's Institute of Technology, Pune, India.

出版信息

Comput Intell Neurosci. 2022 Oct 10;2022:4334852. doi: 10.1155/2022/4334852. eCollection 2022.

DOI:10.1155/2022/4334852

PMID:38501034

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10948228/

Abstract

To consistently assess a patient's internal and external wellness and diagnose chronic conditions like cancer, Alzheimer's disease, and cardiovascular disease, wearable sensing devices are being used. Wearable technologies and networking websites have become incredibly common in the medical sector in recent times. The condition of a patient's health can be influenced by a number of factors, including psychological response, emotional stability, and anxiety levels, which can be evaluated using social network analysis based on graph theory-based techniques and these ideas, known as "social network analysis" (SNA) are used to study relationship phenomena. Therefore, numerous uses for SNA in health research are possible, ranging from social science to exact science. For example, it can be used to research cooperative networks of healthcare providers and hazard-prone behaviors, infectious disease transmission, and the spread of initiatives for health promotion and prevention. Recently, a number of machine learning-based healthcare solutions have been proposed to track chronic illnesses utilizing data from social networks and wearable monitoring devices. In our suggested approach, we are using an intelligent system with the assistance of wearable sensors for the classification of cancer based on DNA methylation, an important epigenetic process in the human genome that controls gene expression and has been connected to a number of health issues. A mixed-sampling imbalanced data ensemble classification technique is created with the help of biomedical sensors to address the problem of class imbalance and high dimensionality in the Cancer Genome Atlas (TCGA) massive data. This technique is based on the Intelligent Synthetic Minority Oversampling (SMOTE) algorithm. The false-negative rate significantly rises as a result of this, to give a larger data set, a new minority class sample will be first obtained. The noise created during the sample expansion process is actually any data that has been acquired, preserved, or altered in a way that prevents the system that initially conceived it from accessing or utilizing it. Noisy data boosts the amount of space needed excessively and can also drastically influence the findings of any data collection investigation and therefore can also affect the sample sets of one or the other class, resulting in the class imbalance which acts as a common problem in ML datasets. The Tomek Link method is then used to eliminate this noise, producing a reasonably balanced data set. Each layer selects two random forest structures using the cascading forest structure of the deep forest (GC-Forest) algorithm to increase the generalization ability of the model and create the final classification model. Experiments using DNA methylation data collected by employing biosensors from six tumor patients reveal that the mixed-sampling unbalanced data ensemble classification technique may increase the sensitivity to the minority class while maintaining the majority class's classification accuracy.

摘要

为了持续评估患者的身心健康并诊断癌症、阿尔茨海默病和心血管疾病等慢性病，人们正在使用可穿戴传感设备。近年来，可穿戴技术和社交网站在医疗领域变得极其普遍。患者的健康状况会受到多种因素的影响，包括心理反应、情绪稳定性和焦虑水平，这些可以使用基于图论技术的社交网络分析来评估，而这些被称为“社交网络分析”（SNA）的理念被用于研究关系现象。因此，SNA在健康研究中有许多用途，从社会科学到精确科学都有涉及。例如，它可用于研究医疗保健提供者的合作网络、易发生危险的行为、传染病传播以及健康促进和预防举措的推广。最近，人们提出了一些基于机器学习的医疗保健解决方案，利用社交网络和可穿戴监测设备的数据来跟踪慢性病。在我们提出的方法中，我们正在使用一个智能系统，借助可穿戴传感器基于DNA甲基化对癌症进行分类，DNA甲基化是人类基因组中一个重要的表观遗传过程，它控制基因表达并与许多健康问题相关联。在生物医学传感器的帮助下，创建了一种混合采样不平衡数据集成分类技术，以解决癌症基因组图谱（TCGA）海量数据中的类不平衡和高维问题。该技术基于智能合成少数过采样（SMOTE）算法。结果，假阴性率显著上升，为了得到更大的数据集，首先会获取一个新的少数类样本。在样本扩展过程中产生的噪声实际上是任何以阻止最初设想它的系统访问或使用的方式获取、保存或改变的数据。噪声会过度增加所需的空间量，还会严重影响任何数据收集调查的结果，因此也会影响一个或另一个类别的样本集，导致类不平衡，这是机器学习数据集中的一个常见问题。然后使用Tomek Link方法消除这种噪声，生成一个合理平衡的数据集。每层使用深度森林（GC - Forest）算法的级联森林结构选择两个随机森林结构，以提高模型的泛化能力并创建最终的分类模型。使用从六名肿瘤患者身上采集的生物传感器DNA甲基化数据进行的实验表明，混合采样不平衡数据集成分类技术在保持多数类分类准确率的同时，可以提高对少数类的敏感性。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

一种基于DNA甲基化、利用机器学习和医疗保健语义知识的癌症检测智能分类系统。

An Intelligent Classification System for Cancer Detection Based on DNA Methylation Using ML and Semantic Knowledge in Healthcare.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

一种基于DNA甲基化、利用机器学习和医疗保健语义知识的癌症检测智能分类系统。

An Intelligent Classification System for Cancer Detection Based on DNA Methylation Using ML and Semantic Knowledge in Healthcare.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献