Division of Gastroenterology, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.
Brigham Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
Hepatol Commun. 2024 Mar 29;8(4). doi: 10.1097/HC9.0000000000000403. eCollection 2024 Apr 1.
Histopathology remains the gold standard for diagnosing and staging metabolic dysfunction-associated steatotic liver disease (MASLD). The feasibility of studying MASLD progression in electronic medical records based on histological features is limited by the free-text nature of pathology reports. Here we introduce a natural language processing (NLP) algorithm to automatically score MASLD histology features.
From the Mass General Brigham health care system electronic medical record, we identified all patients (1987-2021) with steatosis on index liver biopsy after excluding excess alcohol use and other etiologies of liver disease. An NLP algorithm was constructed in Python to detect steatosis, lobular inflammation, ballooning, and fibrosis stage from pathology free-text and manually validated in >1200 pathology reports. Patients were followed from the index biopsy to incident decompensated liver disease accounting for covariates.
The NLP algorithm demonstrated positive and negative predictive values from 93.5% to 100% for all histologic concepts. Among 3134 patients with biopsy-confirmed MASLD followed for 20,604 person-years, rates of the composite endpoint increased monotonically with worsening index fibrosis stage (p for linear trend <0.005). Compared to simple steatosis (incidence rate, 15.06/1000 person-years), the multivariable-adjusted HRs for cirrhosis were 1.04 (0.72-1.5) for metabolic dysfunction-associated steatohepatitis (MASH)/F0, 1.19 (0.92-1.54) for MASH/F1, 1.89 (1.41-2.52) for MASH/F2, and 4.21 (3.26-5.43) for MASH/F3.
The NLP algorithm accurately scores histological features of MASLD from pathology free-text. This algorithm enabled the construction of a large and high-quality MASLD cohort across a multihospital health care system and disclosed an accelerating risk for cirrhosis based on the index MASLD fibrosis stage.
组织病理学仍然是诊断和分期代谢功能障碍相关脂肪性肝病(MASLD)的金标准。基于组织学特征在电子病历中研究 MASLD 进展的可行性受到病理报告的自由文本性质的限制。在这里,我们介绍了一种自然语言处理(NLP)算法,用于自动评分 MASLD 组织学特征。
我们从马萨诸塞州综合医院的医疗保健系统电子病历中,排除了过量饮酒和其他肝病病因后,确定了所有在索引肝活检中出现脂肪变性的患者(1987-2021 年)。在 Python 中构建了一种 NLP 算法,用于从病理报告的自由文本中检测脂肪变性、小叶炎症、气球样变和纤维化分期,并在 >1200 份病理报告中进行了手动验证。从索引活检开始,对患者进行随访,以发现失代偿性肝病事件作为协变量。
该 NLP 算法对所有组织学概念的阳性和阴性预测值均在 93.5%至 100%之间。在 3134 名经活检证实的 MASLD 患者中,随访了 20604 人年,复合终点的发生率随着索引纤维化分期的恶化而呈单调递增(p 值<0.005)。与单纯脂肪变性(发生率为 15.06/1000 人年)相比,代谢功能障碍相关脂肪性肝炎(MASH)/F0 的多变量调整 HR 为 1.04(0.72-1.5),MASH/F1 为 1.19(0.92-1.54),MASH/F2 为 1.89(1.41-2.52),MASH/F3 为 4.21(3.26-5.43)。
该 NLP 算法可从病理报告的自由文本中准确评分 MASLD 的组织学特征。该算法使我们能够在一个多医院医疗保健系统中构建一个大型且高质量的 MASLD 队列,并根据 MASLD 纤维化分期揭示肝硬化风险的加速。