Professor David Carroll Cares About Your Data
Professor David Carroll of Parsons - yes, that Professor David Carroll comes on to discuss how he got embroiled in an international lawsuit with Cambridge Analytica that has far reaching implications for Facebook, marketers and our privacy. He sees all the facets to the issue as a former entrepreneur and professor of media design.
[00:00:25] Adam Pierno: All right, welcome back to another episode of The Strategy Inside Everything. This should be a different kind of episode from what we normally do. We have a speaker who is in the media a lot, and he is also the associate professor of media design at Parsons School of Design, Professor David Carroll. How are you, sir?
[00:00:48] Professor David Carroll: Great to be here. Thanks for having me.
[00:00:51] Adam: Should I call you Professor, or should I call you David, or Dave, or how do you like to be called?
[00:00:55] David: I get called by the three, depending on the context, but we can be casual here, and you can just call me Dave if you want.
[00:01:02] Adam: [laughs] As I was emailing you, I was like, "Should I keep saying Professor, or where is the line here?"
[00:01:09] David: Some of that just probably comes from my Twitter handle, which has Prof Carroll in it, and that's mostly just because David Carroll is actually a really common name, so I had to come up with some unique identifier.
[00:01:23] Adam: Very good. Well, if you're listening to this, and you're saying, "Oh yes, Prof Carroll, why do I know that name?" Would you like, David, to tell people a little bit about what you've been doing recently, and why they've probably have heard of your name?
[00:01:36] David: Sure, I may have been known in marketing and publishing circles, prior to the Cambridge Analytica scandal because I was a tech entrepreneur at one point. In fact, while I was trying to commercialize my academic work into the industry, the beginnings of the Cambridge Analytica Facebook scandal were going on in 2014 to 2015. I'm most well known now for taking on Cambridge Analytica in British courts, to try and gain some transparency into this very opaque world.
I've been getting some coverage by the media for this campaign, which at times we refer to as a data quest, basically, trying to see if the European data protection model can potentially have an impact on American democracy and elections because of the way that our data was processed in England in 2016. A really interesting project, very relevant, and what started as an academic inquiry into figuring out how this industry works, found myself in the middle of international intrigue and investigations, and cloak and dagger type world.
[00:03:07] Adam: [laughs] Now, you did not set out to get embroiled in all that, I'm sure. Did it start out just natural curiosity, and trying to pull at the thread and see where it took you, or did you think you were going to be upsetting the apple cart to the expression that you're?
[00:03:22] David: Great question. I think it's a little of both, meaning my students at Parsons end up to trying to decide between job offers at amazing tech companies like Google, Facebook, [unintelligible 00:03:35], Vimeo, etc. Part of my job is figuring out this industry so that we can prepare students for it, and help them figure out what they're getting into.
In the normal course of my work, even as a now former failed entrepreneur, it's been an always curiosity about the industry, and I have been a part of the industry whether it was during my client facing days, in the early 2000s, or my product facing days when I was a tech entrepreneur in like 2013, '14, '15, before all of this exploded. Yes, I've always been well positioned, in the right place at the right time, and so, some of this where just that being here, and knowing what to look for exactly.
[00:04:31] Adam: Natural curiosity? Tell me about the business that you started as an entrepreneur, and you don't have to sell the business, but just kind of what areas are they?
[00:04:37] David: There is nothing to sell because it was defunct [crosstalk] but it's an interesting story of trying to commercialize academic work in the ad tech and publishing tech such of saturated world. This was, again, five years ago, almost. I had research sponsored by Hearst, actually, through the NYC Media Lab, which is a consortium of media companies, telecom companies, cable, ad agencies, and etc, and universities in New York City.
There was a lot of sponsored research, and we were able to develop some interesting product around the Hearst API at the time. We presented it to the [unintelligible 00:05:32] at the end of the project, and as a joke, I said-- Well, they asked what now, and I said, "Well, you have to start a company to keep working on this," and they said, "Well, you should do that." That was not the response I was expecting, because I said it as a professorially joke.
They did write me a check, and I did try to start the company, but it did not work out for many different reasons. That's a subject of its own podcast, I suppose. That's how I got into this world of the ad tech industrial complex, learned about the lumascape, worked and met one in one with publishers, and saw how the whole digital marketing ecosystem had evolved since I first got into it back in the early 2000s.
[00:06:28] Adam: It's really interesting. Isn't it funny how quick someone else is to say you should make that a business, [laughs] when it's not their skin on the game, and they're like, "Hey, you should go start that business." That helps, yes. That's very funny. You started this business, you were trying to build it, and you started to understand all the data that existed, and that's what lead you down this path of curiosity as you learned more and more about how deep it went.
As you and I were talking about before we started recording, I just came from a phone call where we were talking about technology that lets us pull spend data from customers who use our client's PoS, and build the profile of them, and tells us a lot about their behaviors and their life. I'm always conflicted on those calls. As a marketer, and as a strategist I love that data because it informs direction, it will help make it more efficient, and it will help my clients, and it will help us understand who we're building these products for.
But as a consumer, and as a human, I'm conflicted, because I say, at what point is it intrusive? Is it only intrusive if I know this is Joe Smith, and this is his personal spending data, or even at an aggregate level, where is the line? I suspect you wrestled with some of those same questions as you started learning more and more about how this data processed and available.
[00:08:06] David: Sure, great. I think that the Cambridge Analytica scandal is the watershed moment that has triggered a renegotiation of tracing out the contours of where we can and should draw the line, and have also happened at the same time that the GDPR went to effect, and we're seeing the Brussels effect more broadly as a result that we can loosely term the European model of data privacy, and how it contrasts with the US model. Then in the aftermath of the scandal we're seeing on Capital Hill, an appetite for regulating data and technology in a way that would be unthinkable before Cambridge Analytica.
[00:08:56] Adam: Are you saying that that's a real response, that that appetite really does exist. Is there an actual opportunity for renegotiation, or is more just at the academic level, people are receptive to it?
[00:09:08] David: It's a long, opaque and amorphous process, so it's hard to pin it down. Certainly, we have heard just listening to the testimonies that have been in the Senate and House hearings related to the aftermath, we hear Republican lawmakers and Democrat lawmakers alike referencing the European model, asking witnesses about its pros and cons. You even hear back channel conversations between folks like Christopher Wylie, the Cambridge Analytica whistleblower, and former clients of Cambridge Analytica like Senator Ted Cruz when Wylie went to testify. Even behind the scenes Cruz told Wylie, "We've got to do something about this."
There is a sense of something needs to give because we're seeing broad impacts. I think one of the things we're learning is that prior to Cambridge Analytica, it's like AD and BC in terms of a great cataclysm and we refer to it now that way almost jokingly. Before Cambridge Analytica, the argument was that collecting data about people for advertising and marketing causes no harm.
There is no way to prove harm, and the burden's on individuals and people to prove that it does cause harm. That was the legal principle behind it, in the United States. What Cambridge Analytica has been analogized to is like a valdese oil spill, a deep water horizon event for the marketing and advertising world, because it implicates the entire ecosystem in what essentially is an environmental catastrophe of epic proportion.
[00:11:13] Adam: It's galvanized all the people against people that were on the fringe of saying, "I'm not sure about this," have all started looking with more attention, and saying, "Well, let me really dig into this, and I don't like what I'm finding."
[00:11:24] David: Exactly, it has a very similar effect in galvanizing opponents, and they can say, "See, I told you so." It vindicates critics. What it really focuses on is that when it causes personal harm we're asking the wrong question. It actually causes collective harm. It's more like pollution. It's more like when an entire community is affected, and cancer rates go up in a community, and nobody can say for sure that the fracking well, or the Union Carbide plant in the neighborhood caused the illness, but it's a collective harm.
It's reshaping our thoughts about where we draw the line, and then, at the same time, when the European model is becoming more influential than it used to be in different ways. People are rethinking the European model, and rethinking its merits, and reconsidering its flaws, but the reevaluation is broad. It's pretty hard now to read an article in the press about Facebook or the general topic without Cambridge Analytica being mentioned in the article, as some reference point.
[00:12:36] Adam: You're right. They're stitched together right now in the media narrative, and that's terrible for Facebook. I feel like it's good for consumers to keep being reinforced, even for those who don't know exactly what the story is, that something happened with the data, and it's not good for you, [laughs] the end user. It's over the line.
[00:12:58] David: Yes. It's interesting that it's a global household name because it affected many, many countries, not just United States. It's almost this household term around the world. Any household that tuned in has probably heard the word in some kind of context, and just has a negative association with it, and probably strongly associates it with Facebook, but doesn't understand why.
I doubt that the average person there is such a thing, who was polled and questionnaires and surveyed about, couldn't clearly explain why they feel bad about it, but they could certainly clearly express the negative emotion about the whole escapade. It speaks to the complexity of these issues that they're visceral even though they're complicated.
[00:13:50] Adam: Yes, I think you're right, and the Exxon Valdez example is really smart. The way it became the one event that everybody could wrap their head around as the symptom of what was going on. You started a business, and you were using or learning about the data. If you were starting a business today, knowing now what you know, tell me, what are your thoughts now for marketers or for business people as they have access to all this data, how do you think people can handle that responsibly, and what have you learned or what is your viewpoint on it?
[00:14:30] David: In 2015, 2014, I have specific memory of sitting down with my developer, who is a co-founder, and we were doing our Facebook integration into our platform. We were doing the API integration at the time, and the developer was showing me the incredible amount of personal data that we could harvest from getting someone to connect their Facebook account.
It was shocking, and it was visceral to see what kind of code we could just go in. Also, just simply by installing analytics platforms to study how users were using our platform to learn how to make it better and to make a case to investors that it was worth investing in. Just the default settings allowed us to watch individual users in real time click through the site.
It was an inocuous intent. It was just to let you see what features were being used. The whole point of the enterprise was to try stuff, see if it works, see if people will pay for it, see if you could start a business. That very fundamental necessity to build a business in the digital realm requires a surveillance infrastructure by default. The question is what is acceptable and what is invasive. For me, just seeing how the sausage was made on the other side, rebuilding the stuff, seeing how the default settings were way more invasive than even I thought.
When the company failed, I was liberated to come out and say, "I have some concerns about what's going on here, and I think it's going to blow up in everyone's faces." At the same time that I was doing this, there was a group of folks at Cambridge University, doing the same thing basically, using the Facebook, social API to build a personality quiz that would harvest ultimately up to 87 million people's profiles, and intimate traits about their life that they may not understand, and it may be used in ways that they would not expect.
Certainly, it was collected in a manner that was surreptitious , that is collected for a very specific political purpose that was not disclosed at the time. It showed the recklessness of that era, and I think it was the end of an era. Back to your question about what would you tell entrepreneurs today. It would be that the days of the wild west have come to a close, and it's a good thing, and it can be good for business.
The question is, recalibrating around it, and with the question for data, it becomes how to be just smarter about data, and use it more wisely, and economically, and what is the economy of data collection, and how is data a liability and an asset, how does it represent profit and loss, how is it a burden and an opportunity to create wealth. Having a more balanced view of data, the key thing that is different now.
[00:18:17] Adam: Yes, I agree. Nobody talks about where it can be a loss or where it can be a cost center. The prevailing model right now is just collect everything you can, scrape whatever you need to scrape, go get it, but nobody talks about the downside of gathering all that data or the responsibility of it. It's more of just like, "Let's go get it."
[00:18:41] David: Yes, and I think things like the European model and potentially things like the California privacy model that has just been passed, they point toward where the liability rests. It rests in the responsibility to individual users, to grant them access rights, which is something that businesses in the United States are not prepared for. There will be an initial cost to comply with the new requirements that would come, that are already there in Europe.
Then once those costs are covered, then from there on, you will see where the data is more of a liability in a company because, for example, the retention requirements will change. It becomes an obligation to get rid of data. Even just one person asking for their data can throw a corporation up-side-down because it has to be traced through a lot of the corporation, and most businesses are just not used to asking themselves this question: Where is all the data related to this person? Then what is the definition of personal data itself? That itself is disputed [unintelligible 00:20:13].
[00:20:15] Adam: That's the question because with that if we can't define that, if we can't agree on that, then we can't figure out where the line is of what is fair to collect, and what's not fair to collect.
[00:20:26] David: What's remarkable about the regulatory after effect of Cambridge Analytica and Facebook that we can already see is the British data cop, the Information Commissioner's office, the one who is responsible for regulating Cambridge Analytica basically, and who fined Facebook for violating the Data Protection Act in principle, and is criminally persecuting Cambridge Analytica, and really the company is still in its report called Democracy Disrupted, that was released last month, they talked specifically about inferred data as personal data.
This is the key question for marketers and advertisers alike. A lot of what is traded as data is predictions and inferences about behavior. some parties in the industry like to consider that non-personal data, even if it is attached to identities, and even if those identities are pseudonymous or somehow massed aggregated. In the end, they really are coming. There are inferences attached to individuals, even in the abstract, that the UK, Europe, etc, will, moving forward, view inferred data as personal data.
That is the minute you attach an inference about somebody to their identifier, it becomes personal data in that regime. That's pretty significant because what it means is that, for example, the personality models that were created out of the University of Cambridge methodology and were derived from Facebook accounts, and then were reattached to voter files to generate new predictive models, this is all absolutely definitely personal data in the view of the regulators. I think this is significant, the United States to grapple with is that inferred behavior is personal data in the post-Cambridge Analytica world.
[00:22:55] Adam: Well, let's grapple with it a little bit here. Let's say by that logic, even during a broad MRI analysis of a demographic. Let's say we take 18 to 24-year-old males across the US, and we pull all the habits of those people. That in itself is not personal, because it's super aggregated, and there is no personal identifiers. But if I create a persona, and then I start to say, "Well, they do these things, and they go to the movies twice a month, and so, therefore I think I can sell them a Dr Pepper everytime they go to this theater chain," where is the line? Where does that become crossing that threshold into personal data, using even something that I consider pretty broad in MRI, which is a great data for a marketer, but it's still not as specific as what we're offering up as consumers on social media?
[00:23:56] David: I think what's really fascinating about the response to this question is the role of auditing and forensics to determine the technical moment where an inferred data point gets attached to an identity. The Information Commissioner's office has been conducting what is arguably the world's most complex and large and resourced data forensics investigation in history. They have been reviewing hundreds of terabytes of data, and their interim report that was realized in July just comments on the things to come, and the challenge of doing the forensics on all the data.
But that in October, they anticipate reporting the details of the forensics. I think at that point in October, we will get a very important historical investigation yielding a really precise determination of where data got personal. There will be technical events that occurred where, for example, pseudonymous [unintelligible 00:25:11] were attached to voter files which were associated with Facebook accounts.
Those precise moments would be where aggregates and look-alikes, etc, become PII in the industry parlance, that when pseudonymous identifiers get reattached to PII identifiers, of course, those are moments when we create personal data. But it is probably also moments where aggregate data or inferred or predictive data that comes from a model, which has a confidence value against it. When that gets attached to PII, those moments and what vendor was responsible, and that's where the rubber hits the road.
[00:26:03] Adam: Well, that's going to be a moment of reckoning. [laughs] That [unintelligible 00:26:06] setting is going to be huge.
[00:26:11] David: Yes, there are so many different players in the complexity of this opaque ecosystem, and part of the challenge of the European model is creating accountability through the supply chain.
[00:26:25] Adam: Yes, because from the provider, from the Facebooks that are gathering data and harvesting it, and in some cases using it to ad networks that are layering it to the ad network, to the global advertising holding company networks that are using that to sell media, play to media buyers, but even then down to individual small businesses that are using the self serve Facebook system, the Google system where they don't really have to know that much to go in and create a look-alike, or create a model for a customer so if they cross the line too, it's much more far-reaching. This is going to be huge.
[00:27:05] David: I think speaks to how significant the decision, for example, for Facebook to terminate their third-party data broker relationships in the aftermath of Cambridge Analytica, which would be data matching on the back end. In Facebook, you have the data matching on the front end and the back end. You have it on the front end with custom audiences where people can be matched by name on their Facebook account when an advertiser uploads contact lists, and then, data matching on the back end where Facebook is enriching people's profiles through [unintelligible 00:27:47] and others.
The disruption of that on one end I think speaks to the challenge of being GDPR complaint, but also Facebook's necessity to be defensive in this new reality, and to be more conservative and cautious, and careful about attaching personal data to predictive data.
[00:28:19] Adam: Really interesting stuff. Let me ask you this theoretical hypothetical question. You started a business in which you learned of how much data there was, and the business ultimately didn't succeed. You were able to follow that data and start this quest. What if the business had succeeded? Do you think you would have had the discipline to make the attempt at handling that differently, or do you think you would have just gone with the flow and kept using what was available? Have you thought about that at all?
[00:28:57] David: It's a great question. It's similar to the question that I get. It's like if Hilary Clinton had won the presidency, would I even care about Cambridge Analytica? Well, it's hard to answer that question. I won't cop out, I will take a stab at it. But it's hard to answer that question because I was not able to raise money to fund it, and that's ultimately why it died. I couldn't get even enough seed investors to build it to even a successfully test market it. Some of that was probably because I just had an ethical dilemma, and couldn't get over it in order to get investors, that they probably sensed my unease. It should be easy enough to get investment in a startup that uses terms like AI and machine learning even a couple years ago but I wasn't able--
[00:30:07] Adam: It should be a slam dunk.
[00:30:09] David: Exactly, people should have been throwing money at me because on paper it looked good. Had all the right things in certain ways. Nobody would write me a check after Hearst beyond some friends and family. Why was that? It was because I was not selling it hard enough the way it had to be sold and was I not driven by data monetization in a way that was required by investors. It's hard to imagine because I definitely had all the stakes, all the pressure of starting a business and that wasn't enough.
[00:30:50] Adam: All right, Professor Carroll, David, thank you so much for being here. This was a really, really great conversation. I know I learned a lot. I have a feeling if you and I were together we could talk about this more just on the hypotheticals of where the lines are and what's potentially right and wrong and what makes them so. Thank you for making time I appreciate it.
[00:31:13] David: Great to be here. Thanks for having me.