Information, Operations, and Management Sciences, NYU Stern School of Business, New York, New York.
Decision, Risk, and Operations Division, Columbia Business School, New York, New York.
Big Data. 2018 Sep 1;6(3):191-213. doi: 10.1089/big.2018.0092. Epub 2018 Sep 17.
We develop a number of data-driven investment strategies that demonstrate how machine learning and data analytics can be used to guide investments in peer-to-peer loans. We detail the process starting with the acquisition of (real) data from a peer-to-peer lending platform all the way to the development and evaluation of investment strategies based on a variety of approaches. We focus heavily on how to apply and evaluate the data science methods, and resulting strategies, in a real-world business setting. The material presented in this article can be used by instructors who teach data science courses, at the undergraduate or graduate levels. Importantly, we go beyond just evaluating predictive performance of models, to assess how well the strategies would actually perform, using real, publicly available data. Our treatment is comprehensive and ranges from qualitative to technical, but is also modular-which gives instructors the flexibility to focus on specific parts of the case, depending on the topics they want to cover. The learning concepts include the following: data cleaning and ingestion, classification/probability estimation modeling, regression modeling, analytical engineering, calibration curves, data leakage, evaluation of model performance, basic portfolio optimization, evaluation of investment strategies, and using Python for data science.
我们开发了一些数据驱动的投资策略,展示了机器学习和数据分析如何用于指导点对点贷款的投资。我们详细介绍了从点对点借贷平台获取(真实)数据开始的整个过程,一直到基于各种方法开发和评估投资策略。我们非常关注如何在实际业务环境中应用和评估数据科学方法以及由此产生的策略。本文介绍的材料可供教授本科或研究生水平数据科学课程的教师使用。重要的是,我们不仅仅评估模型的预测性能,还使用真实的、公开可用的数据来评估策略的实际表现。我们的处理方法全面,从定性到技术,但也具有模块化-这使教师能够根据他们想要涵盖的主题灵活地关注案例的特定部分。学习概念包括以下内容:数据清理和摄取、分类/概率估计建模、回归建模、分析工程、校准曲线、数据泄漏、模型性能评估、基本投资组合优化、投资策略评估以及使用 Python 进行数据科学。