Stein Ellen, Hüser Matthias, Amirian E Susan, Palchuk Matvey B, Brown Jeffrey S
TriNetX, LLC, Cambridge, Massachusetts, USA.
Harvard Medical School, Boston, Massachusetts, USA.
Pharmacoepidemiol Drug Saf. 2025 Sep;34(9):e70198. doi: 10.1002/pds.70198.
Many clinical data networks often focus on a single use-case or disease. By contrast, the TriNetX Dataworks-USA Network contains real-world clinical information that can be applied to multiple research questions and use cases. The purpose of this study is to describe the Network's characteristics, as well as its generalizability to the US population, particularly the healthcare-seeking population.
Using the Dataworks-USA Network, a large, regularly updated data network containing de-identified patient electronic health record (EHR) information from across the United States, basic demographics were summarized and compared to the US Census Bureau International Database (IDB) 2022 data and the National Cancer Institute's version of the Census Bureau's U.S. County Population Data for 2022 to examine the generalizability of the Network.
Patients in the Dataworks-USA Network are approximately 5 years older than the Census, and the Network has a larger proportion of female patients. The Network has a lower proportion of patients identified as Asian and White race, and a higher proportion who identify as other relative to the Census; other races are similar between the two data sources (< 1% difference). Regionally, Dataworks-USA has a smaller proportion of patients in all race categories compared with the Census due to the larger proportion of patients of Unknown or Other race.
TriNetX's Dataworks-USA Network provides a robust data source for many use cases and is broadly generalizable to the US population, particularly the healthcare-seeking population, with differences related to the underlying nature of the data sources.
许多临床数据网络通常专注于单一用例或疾病。相比之下,TriNetX美国数据工厂网络包含可应用于多个研究问题和用例的真实世界临床信息。本研究的目的是描述该网络的特征,以及其对美国人群,特别是寻求医疗服务人群的可推广性。
使用美国数据工厂网络,这是一个大型的、定期更新的数据网络,包含来自美国各地的去识别化患者电子健康记录(EHR)信息,总结基本人口统计数据,并与美国人口普查局国际数据库(IDB)2022年数据以及美国国家癌症研究所版本的2022年人口普查局美国县人口数据进行比较,以检验该网络的可推广性。
美国数据工厂网络中的患者比人口普查中的患者大约大5岁,且该网络中女性患者的比例更高。与人口普查相比,该网络中被认定为亚洲和白人种族的患者比例较低,而被认定为其他种族的患者比例较高;两个数据源中其他种族的比例相似(差异<1%)。在区域方面,由于未知或其他种族患者的比例较大,美国数据工厂网络中所有种族类别的患者比例均低于人口普查。
TriNetX的美国数据工厂网络为许多用例提供了强大的数据源,并且在很大程度上可推广到美国人群,特别是寻求医疗服务的人群,但存在与数据源的潜在性质相关的差异。