Banerjee Shilpika, Le Phi, Yang Hai, Zhang Li, He Tao
Department of Mathematics, San Francisco State University, San Francisco, CA 94132, USA.
Department of Medicine, University of California San Francisco, San Francisco, CA 94143 USA.
Stat Innov. 2024;1. Epub 2024 Dec 30.
T-cell Receptors (TCRs) play a pivotal role in antigen recognition and binding, and their sequence similarity significantly impacts the breadth of antigen recognition. Network analysis is employed to explore TCR sequence similarity and investigate the architecture of the TCR repertoire. Network properties hence could be utilized to quantify the structure of the TCR network. However, the heterogeneous nature of TCR network properties poses challenges in performing statistical learning across subjects directly, particularly when assessing their relationship with disease states, clinical outcomes, or patient characteristics. To overcome this challenge, a powerful method is developed, TCR-NP (TCR Network properties Prioritization), that aggregates the raw heterogeneous network properties and conducts grouped feature selection using a pseudo-variables-assisted penalized group Lasso model. Unlike the traditional parameter-tuning using cross-validation, a novel tuning strategy is introduced by incorporating permutation and pseudo-variables to improve the selection performance. The effectiveness of the proposed method is demonstrated through comprehensive evaluation, including simulation studies and real data analysis. By comparing the performance of the different approaches, the advantages of the proposed methodology in capturing the underlying relationships between TCR network properties and clinical outcomes or patient characteristics are highlighted.
T细胞受体(TCR)在抗原识别和结合中起关键作用,其序列相似性显著影响抗原识别的广度。网络分析用于探索TCR序列相似性并研究TCR库的结构。因此,网络属性可用于量化TCR网络的结构。然而,TCR网络属性的异质性在直接对受试者进行统计学习时带来了挑战,特别是在评估它们与疾病状态、临床结果或患者特征的关系时。为了克服这一挑战,开发了一种强大的方法,即TCR-NP(TCR网络属性优先级排序),它汇总原始的异质网络属性,并使用伪变量辅助惩罚组套索模型进行分组特征选择。与使用交叉验证的传统参数调整不同,通过结合排列和伪变量引入了一种新颖的调整策略,以提高选择性能。通过综合评估,包括模拟研究和实际数据分析,证明了所提方法的有效性。通过比较不同方法的性能,突出了所提方法在捕捉TCR网络属性与临床结果或患者特征之间潜在关系方面的优势。