What is “your data”? A Thotify Experiment

On the constructive use of bullshit to bypass important conversations about data collection and data privacy.

The word “data”—like the phrase “fake news”—has become bullshit. I refer here to the definition posed by Harry Frankfurt, who claimed in a 1986 essay (and then a 2005 book) that bullshit language is used when the goal of the utterer is persuasion rather than logic. In the context of current discussions about companies like Google and their hunger for personal information, the definition is important: after all, the goal of Silicon Valley Capitalism is not to draw boundaries around what constitutes “data,” how it should be used, or how it should be collected, but to convince us that (1) we always have control of “our” data, and (2) data collection Provides Value because it Personalizes Your Experience, and so it is a Good Thing Worthy Of Your Consent.

The point where bullshit breaks down is when people interrogate the specific uses of language. Scholars in the field of information studies, for example, constantly debate the definition and interpretation of “information,” to which “data” is linked. One common perception is that “information” becomes “data” when it is captured, but the Latin root word for data actually means “something given.” For this reason, Johanna Drucker has implicitly incorporated capture into her own word, capta, signifying information that is actively taken.

It is in the domain of capta that I want to play, inspired by an interview with Spotify‘s Daniel Ek I recently heard on the increasingly-irritating Freakonomics podcast. I am going to engage in a thought experiment here: speculative, because I don’t actually know how Spotify uses customer data, and also because I posit that Spotify is both not telling us and is interested in knowing more about how you listen to things like podcasts. The company is of interest for this experiment because it is headquartered in Europe, where the General Data Protection Regulation (GDPR) has recently set a new standard for how personal information is collected and handled.

42 minutes into the Freakonomics episode, host Stephen Dubner asks Ek how Spotify uses data. “What we do with it now is very tightly regulated,” Ek responds, referring to GDPR. “All the data that we have around you as a customer, you need to be able to ask us for it.”

“What are your abilities to monetize that data, though?” probes Dubner. Ek cites how they record things like user demographics (age, etc.) and preferred music genres, which can be used by advertisers to target content to you. “We monetize some aspect of [your data],” Ek responds (in part), “but it’s very important to note, though, that we’re not selling any customer data.”

At this point, I have two big questions and I want to call bullshit. My thought experiment outlines why.

First, what does “some aspect” of your data mean? There is a sleight-of-hand at play here, where the purveyors of online services create a false contrast between selling your data and monetizing your data. Promising that I won’t sell your name, age, and email address to a third party is table stakes, at this point, and is typical in many privacy policies. The most important thing that you should know, says Ek in his response to Dubner, is that Spotify does not sell your data. Phew! I feel so much better about my free Spotify account! But this rhetorical flourish obviates the first statement (“we monetize some aspect” of your data) by masking the fact that Spotify is not clear about what “some aspect” means (i.e. how and what they monetize). This is because what they actually sell is what they infer from your name, age, and email address. Advertisers do not care how old you are, and do not want to buy that data; they care about what your age says about you, and want to buy your attention if what your age says about you is interesting enough. A simple number like your age may be innocuous, but information like your email address can say a great deal. Apply your human intuition to these examples: what does DrKris@stanford.edu say about the person behind it? Or adidasFan@hotmail.com? Or president@whitehouse.gov?

The ability of a company like Spotify to earn our trust is predicated on their ability to prove (or seem to prove) that we will not be abused, betrayed, or manipulated by their practices. Ek deflected Dubner’s question by pivoting from monetize to sell and then insisting that they “do not.” See what he did there?

The other reason that companies like Spotify like to collect data—which was not covered in the Freakonomics interview, but this is a thought experiment so bear with me—is so that they can improve their services. Service improvement is a very convenient loophole for companies operating in places like Canada, where PIPEDA legislation posits that information can only be collected for “purposes identified by the organization” (principle 4). So how does knowing my age help Spotify improve its service and stay out of the reach of outdated privacy legislation? In two ways, at least: first, by providing Spotify information that could be used to make the service better (by analyzing your use through data collection—isn’t that delightfully circular?); second, it helps personalize the service (“personalize” is a synonym for “improve”) by connecting advertisers to you as part of their service model, which technically also means that the information Spotify collects is dictated by what their advertisers want to know. Isn’t that delightfully irresponsible? The proof of this lies in the content of many privacy policies, which warn that blocking data collection will harm the service’s ability to work as intended. Not entirely, no! Just as intended!

The second question—and the more dangerous one— is “what is my data?” The GDPR defines personal data in Article 4 as any information relating to an identified or identifiable individual. They further define identifiable as “one who can be identified, directly or indirectly, in particular by reference to an identifier… or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.” This seems quite comprehensive, but there are means of escape. One with which you may be familiar is the idea that personal information is anonymized and/or aggregated, removing the link between the data and its source (i.e. you). As a result, companies can claim that data they have collected does not identify you, and therefore does not fall under the regulation. Article 15 of the GDPR states that the data subject has the right to ask if personal information is being collected, and must provide copies of that data. If I am a data controller and I assert that your data does not identify you, I can also assert I do not have to provide it to you. The fatal flaw is that I do not, as a data controller, have to prove that claim. If I had access to everything Spotify derived about me from my use of their service, I might be able to demonstrate—as many examples have shown—that I can use “anonymous” data to uniquely identify me. I cannot get access to that data because Spotify claims it does not render me identifiable; moreover, if I were to be able to claim that these data rendered me identifiable, Spotify wouldn’t be able to collect it under GDPR Article 6 without my freely given consent. Isn’t that delightfully circular?

Let’s go back to the definition of “my data,” though. It may seem clear that my age (or, more specifically, my birthdate) is “my” data. But do I “own” the fact that Spotify knows what devices I use to access its service? Or when? Or where? Or how often? Or which songs I listen to on repeat? Or that I listen to classical music in the afternoons and EDM every Saturday night? Do I have a right, even under the “landmark” and “game-changing” GDPR, to know about what Spotify observes about my behaviour? I do not, because Spotify claims it is not identifiable data. But: by using directly-collected data to deflect conversations about how capta is used for monetization, and to fudge over a conversation about what “my data” actually is, important negotiations about privacy and autonomy are being actively sidestepped. The incentive is clear: it is not in the best interests of a corporation to be fully-transparent about how it monetizes data, because full transparency in the face of regulation harms profitability. The profitability of services like Spotify is enhanced by knowing as much about me as possible, so that they can attract investment from advertisers, but also so that they can figure out what I love about their service, “perfect” my experience, and ensure that I will stay with them—perhaps even as a subscriber who pays for the privilege of being observed.

People who fail to understand why a company like Spotify would want to buy a podcast creator like Gimlet or a podcast creation platform like Anchor (no, David Court, it’s not about “content”) have successfully been deflected from the real target: behavioural data. The money isn’t in the service or the advertising; it’s in the monetization of surveillance. The insidious thing about an environment where repeated privacy violators are now claiming to be guardians of privacy is that the conversation about what constitutes privacy and what constitutes “personal data” is being driven by people who feel that regulation damages the free market. The irony there is that a truly “free market” also requires honesty and transparency but there is no sustainable reason, under the umbrella of capitalism, for the people who create online services to be honest or transparent about how they observe and record our behaviour.

That’s bullshit.

The featured image on this post is by StockSnap from Pixabay.