利用五种数据挖掘算法结合简化的预处理方法，为非老年成年人建立甲状腺相关激素的参考区间。

Utilization of five data mining algorithms combined with simplified preprocessing to establish reference intervals of thyroid-related hormones for non-elderly adults.

机构信息

Department of Laboratory Medicine, Peking Union Medical College Hospital, Peking Union Medical College & Chinese Academy of Medical Sciences, Beijing, 100730, China.

Department of Laboratory Medicine,, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Peking Union Medical College & Chinese Academy of Medical Sciences, No. 1 Shuaifu Yuan, Dongcheng District, Beijing, 100730, China.

出版信息

BMC Med Res Methodol. 2023 May 2;23(1):108. doi: 10.1186/s12874-023-01898-5.

DOI:10.1186/s12874-023-01898-5

PMID:37131135

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10152698/

Abstract

BACKGROUND

Despite the extensive research on data mining algorithms, there is still a lack of a standard protocol to evaluate the performance of the existing algorithms. Therefore, the study aims to provide a novel procedure that combines data mining algorithms and simplified preprocessing to establish reference intervals (RIs), with the performance of five algorithms assessed objectively as well.

METHODS

Two data sets were derived from the population undergoing a physical examination. Hoffmann, Bhattacharya, Expectation Maximum (EM), kosmic, and refineR algorithms combined with two-step data preprocessing respectively were implemented in the Test data set to establish RIs for thyroid-related hormones. Algorithm-calculated RIs were compared with the standard RIs calculated from the Reference data set in which reference individuals were selected following strict inclusion and exclusion criteria. Objective assessment of the methods is implemented by the bias ratio (BR) matrix.

RESULTS

RIs of thyroid-related hormones are established. There is a high consistency between TSH RIs established by the EM algorithm and the standard TSH RIs (BR = 0.063), although EM algorithms seems to perform poor on other hormones. RIs calculated by Hoffmann, Bhattacharya, and refineR methods for free and total triiodo-thyronine, free and total thyroxine respectively are close and match the standard RIs.

CONCLUSION

An effective approach for objectively evaluating the performance of the algorithm based on the BR matrix is established. EM algorithm combined with simplified preprocessing can handle data with significant skewness, but its performance is limited in other scenarios. The other four algorithms perform well for data with Gaussian or near-Gaussian distribution. Using the appropriate algorithm based on the data distribution characteristics is recommended.

摘要

背景

尽管在数据挖掘算法方面已经进行了广泛的研究，但仍然缺乏评估现有算法性能的标准协议。因此，本研究旨在提供一种新的程序，将数据挖掘算法与简化的预处理相结合，建立参考区间（RI），并客观评估五种算法的性能。

方法

从接受体检的人群中获得两个数据集。Hoffmann、Bhattacharya、Expectation Maximum（EM）、kosmic 和 refineR 算法分别与两步数据预处理相结合，用于建立甲状腺相关激素的 RI。算法计算的 RI 与从参考数据集中计算的标准 RI 进行比较，其中参考个体是根据严格的纳入和排除标准选择的。通过偏差比（BR）矩阵来实现对方法的客观评估。

结果

建立了甲状腺相关激素的 RI。EM 算法建立的 TSH RI 与标准 TSH RI 高度一致（BR=0.063），尽管 EM 算法在其他激素上的表现似乎较差。Hoffmann、Bhattacharya 和 refineR 方法分别计算的游离和总三碘甲状腺原氨酸、游离和总甲状腺素的 RI 接近且与标准 RI 相匹配。

结论

建立了一种基于 BR 矩阵客观评估算法性能的有效方法。EM 算法结合简化的预处理可以处理具有显著偏度的数据，但在其他情况下其性能有限。其他四种算法对于具有高斯或近似高斯分布的数据表现良好。建议根据数据分布特征选择合适的算法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4541/10152698/37c987274201/12874_2023_1898_Fig1_HTML.jpg

相似文献

Utilization of five data mining algorithms combined with simplified preprocessing to establish reference intervals of thyroid-related hormones for non-elderly adults.

BMC Med Res Methodol. 2023 May 2;23(1):108. doi: 10.1186/s12874-023-01898-5.

Validation and comparison of five data mining algorithms using big data from clinical laboratories to establish reference intervals of thyroid hormones for older adults.

Clin Biochem. 2022 Sep;107:40-49. doi: 10.1016/j.clinbiochem.2022.05.008. Epub 2022 May 27.

Validation of an approach using only patient big data from clinical laboratories to establish reference intervals for thyroid hormones based on data mining.

Clin Biochem. 2020 Jun;80:25-30. doi: 10.1016/j.clinbiochem.2020.03.012. Epub 2020 Mar 19.

Establishment of Reference Intervals for Thyroid-Associated Hormones Using refineR Algorithm in Chinese Population at High-Altitude Areas.

Front Endocrinol (Lausanne). 2022 Feb 11;13:816970. doi: 10.3389/fendo.2022.816970. eCollection 2022.

Calculation of reference intervals for the concentrations of α-tocopherol and retinol in serum using indirect data-mining procedures.

Clin Chim Acta. 2024 Jul 15;561:119822. doi: 10.1016/j.cca.2024.119822. Epub 2024 Jun 21.

Verification of Reference Interval of Thyroid Hormones With Manual and Automated Indirect Approaches: Comparison of Hoffman, KOSMIC and refineR Methods.

Cureus. 2023 May 15;15(5):e39066. doi: 10.7759/cureus.39066. eCollection 2023 May.

Effects of Using Different Indirect Techniques on the Calculation of Reference Intervals: Observational Study.

J Med Internet Res. 2023 Jul 17;25:e45651. doi: 10.2196/45651.

Reference interval by the indirect approach of serum thyrotropin (TSH) in a Mediterranean adult population and the association with age and gender.

Clin Chem Lab Med. 2019 Sep 25;57(10):1587-1594. doi: 10.1515/cclm-2018-0957.

[Establishing reference intervals of thyroid hormone based on a laboratory information system].

Zhonghua Nei Ke Za Zhi. 2020 Feb 1;59(2):129-133. doi: 10.3760/cma.j.issn.0578-1426.2020.02.007.

Establishment of a reference interval for total carbon dioxide using indirect methods in Chinese populations living in high-altitude areas: A retrospective real-world analysis.

Clin Biochem. 2023 Sep;119:110631. doi: 10.1016/j.clinbiochem.2023.110631. Epub 2023 Aug 11.

引用本文的文献

Insulin reference intervals in Brazilian adolescents by direct and indirect approaches: validation of a data mining method from laboratory data.

J Pediatr (Rio J). 2024 Sep-Oct;100(5):512-518. doi: 10.1016/j.jped.2024.03.009. Epub 2024 Apr 23.

Comparison of results and age-related changes in establishing reference intervals for CEA, AFP, CA125, and CA199 using four indirect methods.

Pract Lab Med. 2023 Dec 27;38:e00353. doi: 10.1016/j.plabm.2023.e00353. eCollection 2024 Jan.

本文引用的文献

An innovative approach based on real-world big data mining for calculating the sample size of the reference interval established using transformed parametric and non-parametric methods.

BMC Med Res Methodol. 2022 Oct 20;22(1):275. doi: 10.1186/s12874-022-01751-1.

RIbench: A Proposed Benchmark for the Standardized Evaluation of Indirect Methods for Reference Interval Estimation.

Clin Chem. 2022 Nov 3;68(11):1410-1424. doi: 10.1093/clinchem/hvac142.

Validation and comparison of five data mining algorithms using big data from clinical laboratories to establish reference intervals of thyroid hormones for older adults.

Clin Biochem. 2022 Sep;107:40-49. doi: 10.1016/j.clinbiochem.2022.05.008. Epub 2022 May 27.

Data Mining Approaches to Reference Interval Studies.

Clin Chem. 2021 Sep 1;67(9):1175-1181. doi: 10.1093/clinchem/hvab137.

refineR: A Novel Algorithm for Reference Interval Estimation from Real-World Data.

Sci Rep. 2021 Aug 6;11(1):16023. doi: 10.1038/s41598-021-95301-2.

Comparison of reference intervals derived by direct and indirect methods based on compatible datasets obtained in Turkey.

Clin Chim Acta. 2021 Sep;520:186-195. doi: 10.1016/j.cca.2021.05.030. Epub 2021 Jun 1.

The importance of correct stratifications when comparing directly and indirectly estimated reference intervals.

Clin Chem Lab Med. 2021 May 28. doi: 10.1515/cclm-2021-0353.

Effect of sample size and the traditional parametric, nonparametric, and robust methods on the establishment of reference intervals: Evidence from real world data.

Clin Biochem. 2021 Jun;92:67-70. doi: 10.1016/j.clinbiochem.2021.03.006. Epub 2021 Mar 20.

Gender and age-specific reference intervals of common biochemical analytes in Chinese population: Derivation using real laboratory data.

J Med Biochem. 2020 Sep 2;39(3):384-391. doi: 10.2478/jomb-2019-0046.

Real-world big-data studies in laboratory medicine: Current status, application, and future considerations.

Clin Biochem. 2020 Oct;84:21-30. doi: 10.1016/j.clinbiochem.2020.06.014. Epub 2020 Jul 8.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用五种数据挖掘算法结合简化的预处理方法，为非老年成年人建立甲状腺相关激素的参考区间。

Utilization of five data mining algorithms combined with simplified preprocessing to establish reference intervals of thyroid-related hormones for non-elderly adults.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献