Whose Data, Whose Value? Simple Exercises in Data and Modeling Evaluation with Implications for Technology Law and Policy
Aileen Nielsen
Scholarship on the phenomena of big data and algorithmically-driven digital environments has largely studied these technological and economic phenomena as monolithic practices, with little interest in the varied quality of contributions by data subjects and data processors. Taking a pragmatic, industry-inspired approach to measuring the quality of contributions, this work finds evidence for a wide range of relative value contributions by data subjects. In some cases, a very small proportion of data from a few data subjects is sufficient to achieve the same performance on a given task as would be achieved with a much larger data set. Likewise, algorithmic models generated by different data processors for the same task and with the same data resources show a wide range in quality of contribution, even in highly performance-incentivized conditions. In short, contrary to the trope of data as the new oil, data subjects, and indeed individual data points within the same data set, are neither equal nor fungible. Moreover, the role of talent and skill in algorithmic development is significant, as with other forms of innovation. Both of these observations have received little, if any, attention in discussions of data governance. In this essay, I present evidence that both data subjects and data controllers exhibit significant variations in the measured value of their contributions to the standard Big Data pipeline. I then establish that such variations are worth considering in technology policy for privacy, competition, and innovation.
The observation of substantial variation among data subjects and data processors could be important in crafting appropriate law for the Big Data economy. Heterogeneity in value contribution is undertheorized in tech law scholarship and implications for privacy law, competition policy, and innovation. The work concludes by highlighting some of these implications and posing an empirical research agenda to fill in information needed to realize policies sensitive to the wide range of talent and skill exhibited by data subjects and data processors alike.