Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544-5263, USA.
Toxicol Sci. 2010 Nov;118(1):251-65. doi: 10.1093/toxsci/kfq233. Epub 2010 Aug 11.
In this work, we combine the strengths of mixed-integer linear optimization (MILP) and logistic regression for predicting the in vivo toxicity of chemicals using only their measured in vitro assay data. The proposed approach utilizes a biclustering method based on iterative optimal reordering (DiMaggio, P. A., McAllister, S. R., Floudas, C. A., Feng, X. J., Rabinowitz, J. D., and Rabitz, H. A. (2008). Biclustering via optimal re-ordering of data matrices in systems biology: rigorous methods and comparative studies. BMC Bioinformatics 9, 458-474.; DiMaggio, P. A., McAllister, S. R., Floudas, C. A., Feng, X. J., Rabinowitz, J. D., and Rabitz, H. A. (2010b). A network flow model for biclustering via optimal re-ordering of data matrices. J. Global. Optim. 47, 343-354.) to identify biclusters corresponding to subsets of chemicals that have similar responses over distinct subsets of the in vitro assays. The biclustering of the in vitro assays is shown to result in significant clustering based on assay target (e.g., cytochrome P450 [CYP] and nuclear receptors) and type (e.g., downregulated BioMAP and biochemical high-throughput screening protein kinase activity assays). An optimal method based on mixed-integer linear optimization for reordering sparse data matrices (DiMaggio, P. A., McAllister, S. R., Floudas, C. A., Feng, X. J., Li, G. Y., Rabinowitz, J. D., and Rabitz, H. A. (2010a). Enhancing molecular discovery using descriptor-free rearrangement clustering techniques for sparse data sets. AIChE J. 56, 405-418.; McAllister, S. R., DiMaggio, P. A., and Floudas, C. A. (2009). Mathematical modeling and efficient optimization methods for the distance-dependent rearrangement clustering problem. J. Global. Optim. 45, 111-129) is then applied to the in vivo data set (21.7% sparse) in order to cluster end points that have similar lowest effect level (LEL) values, where it is observed that the end points are effectively clustered according to (1) animal species (i.e., the chronic mouse and chronic rat end points were clearly separated) and (2) similar physiological attributes (i.e., liver- and reproductive-related end points were found to separately cluster together). As the liver and reproductive end points exhibited the largest degree of correlation, we further analyzed them using regularized logistic regression in a rank-and-drop framework to identify which subset of in vitro features could be utilized for in vivo toxicity prediction. It was observed that the in vivo end points that had similar LEL responses over the 309 chemicals (as determined by the sparse clustering results) also shared a significant subset of selected in vitro descriptors. Comparing the significant descriptors between the two different categories of end points revealed a specificity of the CYP assays for the liver end points and preferential selection of the estrogen/androgen nuclear receptors by the reproductive end points.
在这项工作中,我们结合了混合整数线性优化(MILP)和逻辑回归的优势,仅使用化学物质的体外测定数据来预测其体内毒性。所提出的方法利用基于迭代最优重排的双聚类方法(DiMaggio,P. A.,McAllister,S. R.,Floudas,C. A.,Feng,X. J.,Rabinowitz,J. D.,和 Rabitz,H. A.(2008)。通过数据矩阵的最优重排进行双聚类:系统生物学中的严格方法和比较研究。BMC 生物信息学 9,458-474;DiMaggio,P. A.,McAllister,S. R.,Floudas,C. A.,Feng,X. J.,Rabinowitz,J. D.,和 Rabitz,H. A.(2010b)。通过数据矩阵的最优重排进行双聚类的网络流模型。J. Global. Optim. 47,343-354)识别与具有相似反应的化学物质子集对应的双聚类,这些子集在不同的体外测定子集上具有相似的反应。体外测定的双聚类显示基于测定靶标(例如细胞色素 P450 [CYP]和核受体)和类型(例如下调的 BioMAP 和生化高通量筛选蛋白激酶活性测定)进行了显著聚类。基于混合整数线性优化的用于重新排列稀疏数据矩阵的最优方法(DiMaggio,P. A.,McAllister,S. R.,Floudas,C. A.,Feng,X. J.,Li,G. Y.,Rabinowitz,J. D.,和 Rabitz,H. A.(2010a)。用于稀疏数据集的无描述符重排聚类技术的分子发现增强。AIChE J. 56,405-418;McAllister,S. R.,DiMaggio,P. A.,和 Floudas,C. A.(2009)。用于距离相关重排聚类问题的数学建模和有效优化方法。J. Global. Optim. 45,111-129)应用于体内数据集(21.7%稀疏),以对具有相似最低效应水平(LEL)值的终点进行聚类,观察到终点根据(1)动物物种(即慢性小鼠和慢性大鼠终点明显分离)和(2)相似的生理属性(即肝和生殖相关终点分别聚类在一起)进行有效聚类。由于肝和生殖终点表现出最大程度的相关性,我们进一步使用正则化逻辑回归在排序和丢弃框架中对它们进行分析,以确定哪些子集的体外特征可用于体内毒性预测。观察到具有相似 LEL 反应的体内终点(如稀疏聚类结果所示)也共享选定的体外描述符的重要子集。比较两个不同类别的终点之间的显著描述符,揭示了 CYP 测定对肝终点的特异性和生殖终点对雌激素/雄激素核受体的优先选择。