Lee Kyeryoung, Paek Hunki, Huang Liang-Chin, Hilton C Beau, Datta Surabhi, Higashi Josh, Ofoegbu Nneka, Wang Jingqi, Rubinstein Samuel M, Cowan Andrew J, Kwok Mary, Warner Jeremy L, Xu Hua, Wang Xiaoyan
IMO Health, Rosemont, IL, USA.
Division of Hematology and Oncology, Vanderbilt University, Nashville, TN, USA.
Inform Med Unlocked. 2024;50. doi: 10.1016/j.imu.2024.101589. Epub 2024 Oct 11.
Initial insights into oncology clinical trial outcomes are often gleaned manually from conference abstracts. We aimed to develop an automated system to extract safety and efficacy information from study abstracts with high precision and fine granularity, transforming them into computable data for timely clinical decision-making.
We collected clinical trial abstracts from key conferences and PubMed (2012-2023). The SEETrials system was developed with three modules: preprocessing, prompt engineering with knowledge ingestion, and postprocessing. We evaluated the system's performance qualitatively and quantitatively and assessed its generalizability across different cancer types- multiple myeloma (MM), breast, lung, lymphoma, and leukemia. Furthermore, the efficacy and safety of innovative therapies, including CAR-T, bispecific antibodies, and antibody-drug conjugates (ADC), in MM were analyzed across a large scale of clinical trial studies.
SEETrials achieved high precision (0.964), recall (sensitivity) (0.988), and F1 score (0.974) across 70 data elements present in the MM trial studies Generalizability tests on four additional cancers yielded precision, recall, and F1 scores within the 0.979-0.992 range. Variation in the distribution of safety and efficacy-related entities was observed across diverse therapies, with certain adverse events more common in specific treatments. Comparative performance analysis using overall response rate (ORR) and complete response (CR) highlighted differences among therapies: CAR-T (ORR: 88 %, 95 % CI: 84-92 %; CR: 95 %, 95 % CI: 53-66 %), bispecific antibodies (ORR: 64 %, 95 % CI: 55-73 %; CR: 27 %, 95 % CI: 16-37 %), and ADC (ORR: 51 %, 95 % CI: 37-65 %; CR: 26 %, 95 % CI: 1-51 %). Notable study heterogeneity was identified (>75 % heterogeneity index scores) across several outcome entities analyzed within therapy subgroups.
SEETrials demonstrated highly accurate data extraction and versatility across different therapeutics and various cancer domains. Its automated processing of large datasets facilitates nuanced data comparisons, promoting the swift and effective dissemination of clinical insights.
肿瘤学临床试验结果的初步见解通常是通过人工从会议摘要中收集的。我们旨在开发一个自动化系统,以高精度和细粒度从研究摘要中提取安全性和有效性信息,将其转化为可计算的数据,以便及时进行临床决策。
我们从主要会议和PubMed(2012 - 2023年)收集了临床试验摘要。SEETrials系统由三个模块开发而成:预处理、知识摄入的提示工程和后处理。我们对该系统的性能进行了定性和定量评估,并评估了其在不同癌症类型(多发性骨髓瘤(MM)、乳腺癌、肺癌、淋巴瘤和白血病)中的通用性。此外,还在大规模临床试验研究中分析了创新疗法(包括CAR-T、双特异性抗体和抗体药物偶联物(ADC))在MM中的疗效和安全性。
SEETrials在MM试验研究中的70个数据元素上实现了高精度(0.964)、召回率(敏感性)(0.988)和F1分数(0.974)。对另外四种癌症的通用性测试得出的精度、召回率和F1分数在0.979 - 0.992范围内。在不同疗法中观察到安全性和有效性相关实体分布的差异,某些不良事件在特定治疗中更常见。使用总缓解率(ORR)和完全缓解(CR)的比较性能分析突出了不同疗法之间的差异:CAR-T(ORR:88%,95%CI:84 - 92%;CR:95%,95%CI:53 - 66%)、双特异性抗体(ORR:64%,95%CI:55 - 73%;CR:27%,95%CI:16 - 37%)和ADC(ORR:51%,95%CI:37 - 65%;CR:26%,95%CI:由1 - 51%)。在治疗亚组内分析的几个结果实体中发现了显著的研究异质性(>75%异质性指数得分)。
SEETrials在不同治疗方法和各种癌症领域展示了高度准确的数据提取和通用性。其对大型数据集的自动化处理有助于进行细致入微的数据比较,促进临床见解的迅速有效传播。