基于美国国家健康与营养检查调查（NHANES）数据集构建癌症患者数据库用于癌症流行病学研究。

Construction of the cancer patients' database based on the US National Health and Nutrition Examination Survey (NHANES) datasets for cancer epidemiology research.

作者信息

Moon Jinyoung, Mun Yongseok

机构信息

Interdisciplinary Program in Bioinformatics, College of Natural Sciences, Seoul National University, 1, Gwanak-ro, Gwanak-gu, Seoul, 08826, South Korea.

Department of Ophthalmology, Kangnam Sacred Heart Hospital, Hallym University College of Medicine, 1, Singil-ro, Yeongdeungpo-gu, Seoul, 07441, South Korea.

出版信息

BMC Med Res Methodol. 2025 Jan 24;25(1):17. doi: 10.1186/s12874-025-02478-5.

DOI:10.1186/s12874-025-02478-5

PMID:39856567

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11758729/

Abstract

BACKGROUND

The US National Health and Nutrition Examination Survey (NHANES) dataset does not include a specific question or laboratory test to confirm a history of cancer diagnosis. However, if straightforward variables for cancer history are introduced, US NHANES could be effectively utilized in future cancer epidemiology studies. To address this gap, the authors developed a cancer patient database from the US NHANES datasets by employing multiple R programming codes.

METHODS

To illustrate the practical application of this methodology to a real-world problem, the authors extracted the R codes applied in an academic paper published in another journal on January 30th, 2024 ( https://doi.org/10.1016/j.heliyon.2024.e24337 ). This paper will focus on the construction of the database and analysis using R codes. Entire.

RESULTS

In the first example, the urine concentration of monocarboxynonyl phthalate, monocarboxyoctyl phthalate, mono-2-ethyl-5-carboxypentyl phthalate, and mono-2-hydroxy-iso-butyl phthalate (all ng/mL) were used as the independent variable, instead of the serum concentration of perfluorooctanoic acid (PFOA), perfluorooctane sulfonic acid (PFOS), perfluorohexane sulfonic acid (PFHxS), and perfluorononanoic acid (PFNA), respectively. In the second example, the serum concentration of 2,3,3',4,4'-Pentachlorobiphenyl (PCB105), 2,3,4,4´,5-Pentachlorobiphenyl (PCB114), 2,3',4,4',5-Pentachlorobiphenyl (PCB118), and 2,2',3,4,4',5'- and 2,3,3',4,4',6-Hexachlorobiphenyl (PCB138) were used as the independent variable, instead of the serum concentration of PFOA, PFOS, PFHxS, and PFNA, respectively.

DISCUSSION

This research offers a comprehensive set of R codes aimed at creating a single, user-friendly variable that encapsulates the history of each type of cancer while also considering the age at which the diagnosis was made. The US NHANES provides a wealth of critical data on environmental toxicant exposures. By employing these R codes, researchers can potentially discover numerous new associations between environmental toxicant exposures and cancer diagnoses. Ultimately, these codes could significantly advance the field of cancer epidemiology in relation to environmental toxicant exposure.

摘要

背景

美国国家健康与营养检查调查（NHANES）数据集不包括用于确认癌症诊断史的特定问题或实验室检测。然而，如果引入用于癌症史的直接变量，美国NHANES可在未来的癌症流行病学研究中得到有效利用。为填补这一空白，作者通过使用多个R编程代码，从美国NHANES数据集中开发了一个癌症患者数据库。

方法

为说明该方法在实际问题中的实际应用，作者提取了在2024年1月30日发表于另一期刊的一篇学术论文（https://doi.org/10.1016/j.heliyon.2024.e24337）中应用的R代码。本文将重点关注数据库的构建以及使用R代码进行的分析。完整内容。

结果

在第一个例子中，分别使用单羧基壬基邻苯二甲酸酯、单羧基辛基邻苯二甲酸酯、单 - 2 - 乙基 - 5 - 羧基戊基邻苯二甲酸酯和单 - 2 - 羟基异丁基邻苯二甲酸酯的尿液浓度（均为ng/mL）作为自变量，而不是全氟辛酸（PFOA）、全氟辛烷磺酸（PFOS）、全氟己烷磺酸（PFHxS）和全氟壬酸（PFNA）的血清浓度。在第二个例子中，分别使用2,3,3',4,4'-五氯联苯（PCB105）、2,3,4,4´,5 - 五氯联苯（PCB114）、2,3',4,4',5 - 五氯联苯（PCB118）以及2,2',3,4,4',5'-和2,3,3',4,4',6 - 六氯联苯（PCB138）的血清浓度作为自变量, 而不是PFOA、PFOS、PFHxS和PFNA的血清浓度。

讨论

本研究提供了一套全面的R代码，旨在创建一个单一的、用户友好的变量，该变量封装了每种癌症类型的病史，同时还考虑了诊断时的年龄。美国NHANES提供了大量关于环境毒物暴露的关键数据。通过使用这些R代码，研究人员有可能发现环境毒物暴露与癌症诊断之间的众多新关联。最终, 这些代码可显著推进与环境毒物暴露相关的癌症流行病学领域的发展。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于美国国家健康与营养检查调查（NHANES）数据集构建癌症患者数据库用于癌症流行病学研究。

Construction of the cancer patients' database based on the US National Health and Nutrition Examination Survey (NHANES) datasets for cancer epidemiology research.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

DISCUSSION

背景

方法

结果

讨论

相似文献

本文引用的文献

相似文献

本文引用的文献

基于美国国家健康与营养检查调查（NHANES）数据集构建癌症患者数据库用于癌症流行病学研究。

Construction of the cancer patients' database based on the US National Health and Nutrition Examination Survey (NHANES) datasets for cancer epidemiology research.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

DISCUSSION

背景

方法

结果

讨论