Tempus Labs, Inc., Chicago, IL, 60654, USA*Joint first authorship.
Pac Symp Biocomput. 2024;29:433-445.
The incompleteness of race and ethnicity information in real-world data (RWD) hampers its utility in promoting healthcare equity. This study introduces two methods-one heuristic and the other machine learning-based-to impute race and ethnicity from genetic ancestry using tumor profiling data. Analyzing de-identified data from over 100,000 cancer patients sequenced with the Tempus xT panel, we demonstrate that both methods outperform existing geolocation and surname-based methods, with the machine learning approach achieving high recall (range: 0.859-0.993) and precision (range: 0.932-0.981) across four mutually exclusive race and ethnicity categories. This work presents a novel pathway to enhance RWD utility in studying racial disparities in healthcare.
真实世界数据(RWD)中种族和民族信息的不完整阻碍了其在促进医疗保健公平性方面的应用。本研究介绍了两种方法——一种是启发式方法,另一种是基于机器学习的方法——利用肿瘤分析数据从遗传祖先推断种族和民族。我们分析了超过 100000 名使用 Tempus xT 面板测序的癌症患者的匿名数据,结果表明这两种方法均优于现有的基于地理位置和姓氏的方法,机器学习方法在四个相互排斥的种族和民族类别中实现了高召回率(范围:0.859-0.993)和高精度(范围:0.932-0.981)。这项工作提出了一种新的途径,以增强 RWD 在研究医疗保健中种族差异方面的效用。
J Health Soc Behav. 2014-12
Int J Stat Med Res. 2022-1-28
Am J Med Genet A. 2007-5-1
JCO Oncol Pract. 2024-11
Genome Med. 2024-8-13