Division of Gastroenterology and HepatologyStanford University Medical CenterPalo AltoCaliforniaUSA.
Division of Gastroenterology and HepatologyUniversity of MichiganAnn ArborMichiganUSA.
Hepatology. 2022 Feb;75(2):430-437. doi: 10.1002/hep.32142. Epub 2021 Dec 7.
Chronic hepatitis B (CHB) affects >290 million persons globally, and only 10% have been diagnosed, presenting a severe gap that must be addressed. We developed logistic regression (LR) and machine learning (ML; random forest) models to accurately identify patients with HBV, using only easily obtained demographic data from a population-based data set.
We identified participants with data on HBsAg, birth year, sex, race/ethnicity, and birthplace from 10 cycles of the National Health and Nutrition Examination Survey (1999-2018) and divided them into two cohorts: training (cycles 2, 3, 5, 6, 8, and 10; n = 39,119) and validation (cycles 1, 4, 7, and 9; n = 21,569). We then developed and tested our two models. The overall cohort was 49.2% male, 39.7% White, 23.2% Black, 29.6% Hispanic, and 7.5% Asian/other, with a median birth year of 1973. In multivariable logistic regression, the following factors were associated with HBV infection: birth year 1991 or after (adjusted OR [aOR], 0.28; p < 0.001); male sex (aOR, 1.49; p = 0.0080); Black and Asian/other versus White (aOR, 5.23 and 9.13; p < 0.001 for both); and being USA-born (vs. foreign-born; aOR, 0.14; p < 0.001). We found that the ML model consistently outperformed the LR model, with higher area under the receiver operating characteristic values (0.83 vs. 0.75 in validation cohort; p < 0.001) and better differentiation of high- and low-risk persons.
Our ML model provides a simple, targeted approach to HBV screening, using only easily obtained demographic data.
慢性乙型肝炎(CHB)影响全球超过 2.9 亿人,而仅有 10%的患者得到诊断,这是一个亟待解决的严重差距。我们开发了逻辑回归(LR)和机器学习(ML;随机森林)模型,仅使用基于人群的数据集易于获得的人口统计学数据来准确识别乙型肝炎患者。
我们从 10 个周期的全国健康和营养检查调查(1999-2018 年)中确定了具有 HBsAg、出生年份、性别、种族/民族和出生地数据的参与者,并将他们分为两个队列:训练(周期 2、3、5、6、8 和 10;n=39119)和验证(周期 1、4、7 和 9;n=21569)。然后,我们开发并测试了我们的两个模型。整个队列中,男性占 49.2%,白人占 39.7%,黑人占 23.2%,西班牙裔占 29.6%,亚洲/其他占 7.5%,中位出生年份为 1973 年。在多变量逻辑回归中,以下因素与乙型肝炎感染相关:1991 年或以后出生(调整后的比值比[OR],0.28;p<0.001);男性(OR,1.49;p=0.0080);黑人和亚洲/其他与白人(OR,5.23 和 9.13;均<0.001);以及在美国出生(与外国出生;OR,0.14;p<0.001)。我们发现,ML 模型始终优于 LR 模型,具有更高的接收者操作特征曲线下面积(验证队列中分别为 0.83 与 0.75;p<0.001)和更好的高风险和低风险人群区分度。
我们的 ML 模型仅使用易于获得的人口统计学数据,提供了一种简单、有针对性的乙型肝炎筛查方法。