Inferring phenotypes from substance use via collaborative matrix completion.

作者信息

Lu Jin, Sun Jiangwen, Wang Xinyu, Kranzler Henry, Gelernter Joel, Bi Jinbo

机构信息

Department of Computer Science and Engineering, University of Connecticut, 371 Fairfield Way, Unit 4155, Storrs, CT, USA.

Department of Psychiatry, University of Pennsylvania Perelman School of Medicine, 3535 Market Street, Suite 500 and Crescenz Veterans Affairs Medical Center, Philadelphia, PA, USA.

出版信息

BMC Syst Biol. 2018 Nov 22;12(Suppl 6):104. doi: 10.1186/s12918-018-0623-5.

DOI:10.1186/s12918-018-0623-5

PMID:30463556

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6249733/

Abstract

BACKGROUND

Although substance use disorders (SUDs) are heritable, few genetic risk factors for them have been identified, in part due to the small sample sizes of study populations. To address this limitation, researchers have aggregated subjects from multiple existing genetic studies, but these subjects can have missing phenotypic information, including diagnostic criteria for certain substances that were not originally a focus of study. Recent advances in addiction neurobiology have shown that comorbid SUDs (e.g., the abuse of multiple substances) have similar genetic determinants, which makes it possible to infer missing SUD diagnostic criteria using criteria from another SUD and patient genotypes through statistical modeling.

RESULTS

We propose a new approach based on matrix completion techniques to integrate features of comorbid health conditions and individual's genotypes to infer unreported diagnostic criteria for a disorder. This approach optimizes a bi-linear model that uses the interactions between known disease correlations and candidate genes to impute missing criteria. An efficient stochastic and parallel algorithm was developed to optimize the model with a speed 20 times greater than the classic sequential algorithm. It was tested on 3441 subjects who had both cocaine and opioid use disorders and successfully inferred missing diagnostic criteria with consistently better accuracy than other recent statistical methods.

CONCLUSIONS

The proposed matrix completion imputation method is a promising tool to impute unreported or unobserved symptoms or criteria for disease diagnosis. Integrating data at multiple scales or from heterogeneous sources may help improve the accuracy of phenotype imputation.

摘要