There is no such thing as ‘data’

The writer is a technology analyst

Technology is full of narratives, but one of the loudest and most persistent concerns artificial intelligence and something called “data”.

AI is the future, we are told, and it’s all about data — and data is the future, and we should own it and maybe be paid for it. And countries need data strategies and data sovereignty, too. Data is the new oil.

This is mostly nonsense. There is no such thing as “data”, it isn’t worth anything, and it doesn’t belong to you anyway.

Most obviously, data is not one thing, but innumerable different collections of information, each of them specific to a particular application, that can’t be used for anything else.

For instance, Siemens has wind turbine telemetry and Transport for London has ticket swipes, and those aren’t interchangeable. You can’t use the turbine telemetry to plan a new bus route, and if you gave both sets of data to Google or Tencent, that wouldn’t help them build a better image recognition system.

This might seem trivial put so bluntly, but it points to the uselessness of very common assertions on the lines of “China has more data” — more of what data? Meituan delivers 50mn restaurant orders a day, and that lets it build a more efficient routing algorithm, but you can’t use that for a missile guidance system. You can’t even use it to build restaurant delivery in London. “Data” does not exist — there are merely many sets of data.

Of course, when people talk about data they mostly mean “your” data — your information and the things that you do on the internet, some of which is sifted, aggregated and deployed by technology companies. We want more privacy controls, but we also think we should have ownership of that data, wherever it is.

The trouble is, most of the meaning in “your” data is not in you but in all of the interactions with other people. What you post on Instagram means very little: the signal is in who liked your posts and what else they liked, in what you liked and who else liked it, and in who follows you, who else they follow and who follows them, and so on outwards in a mesh of interactions between millions of people.

If I like your picture, that is not “my” data or “your” data alone, and it’s not worth anything without that context. You can’t take that with you because you don’t really own it, and even if you did you couldn’t plug it into TikTok, because TikTok has a completely different mesh.

This prompted my friend, the tech guru Tim O’Reilly, to say: “Data is not the new oil. It is sand.” Data is valuable only in the aggregate of millions. Indeed, this can be true even on a simple cash flow basis. For instance, in the first quarter of 2022 Meta made just 99 cents of free cash flow per daily active user per month.

This applies even for data where you can meaningfully say that it’s yours. Your electricity usage is not about other people, but it’s not valuable by itself, only in the aggregate of all domestic electricity usage in south London or Brooklyn, or wherever. And, again, data isn’t fungible — a power utility needs this data, but it’s no use to LinkedIn.

Indeed, for many of these systems the value isn’t in the data at all but in the flow of activity around it — the meaning is not in the picture or video you post but in how the network reacts to it tomorrow. You could see TikTok or PageRank as vast “mechanical Turks” — we do not yet have AI that can understand what every page, picture or video are in themselves, and so we need humans, all of us, in the loop somewhere, at the right point of leverage, liking, linking, clicking and watching. These are systems, not data, and the value is in the flow.

We’ve been here before: today’s AI looks a lot like databases in the 1980s. Both technologies transform what we can do with information and what questions we can ask, and how organisations can function.

When databases were new, we worried, just as we do now about AI. Some of those worries were real, but no one today asks if America has more databases, or if it matters that SAP is German. No one at Davos talks about “SQL colonialism”.

These technologies are not strategic assets, Anyone can have them, but what for? Databases enabled just-in-time supply chains, and Walmart, and allowed Apple to make iPhones in China — those are the strategic questions. The same is true for AI, and data. It’s not the new oil, just more software. The real question is what you build with it.

Source: Financial Times