Center for Urban Science and Progress, New York University , Brooklyn, New York.
Big Data. 2017 Sep;5(3):189-196. doi: 10.1089/big.2016.0052. Epub 2017 Aug 22.
Many municipal agencies maintain detailed and comprehensive electronic records of their interactions with citizens. These data, in combination with machine learning and statistical techniques, offer the promise of better decision making, and more efficient and equitable service delivery. However, a data scientist employed by an agency to implement these techniques faces numerous and varied choices that cumulatively can have significant real-world consequences. The data scientist, who may be the only person at an agency equipped to understand the technical complexity of a predictive algorithm, therefore, bears a good deal of responsibility in making judgments. In this perspective, I use a concrete example from my experience of working with New York City's Administration for Children's Services to illustrate the social and technical tradeoffs that can result from choices made in each step of data analysis. Three themes underlie these tradeoffs: the importance of frequent communication between the data scientist, agency leadership, and domain experts; the agency's resources and organizational constraints; and the necessity of an ethical framework to evaluate salient costs and benefits. These themes inform specific recommendations that I provide to guide agencies that employ data scientists and rely on their work in designing, testing, and implementing predictive algorithms.
许多市政机构都保存着其与市民互动的详细且全面的电子记录。这些数据与机器学习和统计技术相结合,有望做出更好的决策,提供更高效、更公平的服务。然而,被机构雇用实施这些技术的数据科学家面临着许多不同的选择,这些选择加起来可能会对现实世界产生重大影响。作为机构中唯一有能力理解预测算法技术复杂性的人,数据科学家在做出判断时承担着很大的责任。在这篇观点文章中,我使用了一个具体的例子来说明在数据分析的每一步中所做出的选择可能带来的社会和技术权衡,这个例子来自于我在纽约市儿童服务管理局工作的经验。这三个主题是:数据科学家、机构领导和领域专家之间频繁沟通的重要性;机构的资源和组织限制;以及评估突出成本和收益的伦理框架的必要性。这些主题为我提供了具体的建议,以指导雇用数据科学家并依赖他们设计、测试和实施预测算法的机构。