Tomorrow
Making Asimov’s Psychohistory a Reality
Using Big Data to Predict the Future
In his seminal Foundation series, legendary science fiction author Isaac Asimov introduced the world to the concept of psychohistory: the ability to predict the future using existing data. Now Google developer expert Kalev Leetaru is making that concept a reality. Lucy Ingham finds out more
Once in a while a technology comes along that allows a concept from science fiction to step out of the pages and into reality. When it happens, it’s an incredible moment, applying the frame of the real world around a concept that has until that time been viewed as utterly fantastical.
But there are those technologies that are assumed to be either not possible, or too far away from reality to be considered a likely addition to our lives. And until recently, Isaac Asimov’s psychohistory was, for most of us, among that group.
Forming a core component of Asimov’s Foundation series, psychohistory is a fictional scientific practice that allows trained professionals to predict future population behaviour using historical, sociological and statistical data.
“’Any fool can tell a crisis when it arrives; the real service to society is to detect it in embryo’. When I first heard these words as a child in Isaac Asimov's Foundation series, I was fascinated by this vision of psychohistory: that we could use computers to make sense of all the data that surrounds us,” says Kalev Leetaru, acclaimed Google developer expert for the Google Cloud Platform and senior fellow at the George Washington University Center for Cyber & Homeland Security, in a talk at Web Summit.
“To map the present, and even perhaps forecast the future. That machines can make sense of this deluge of data that surrounds us, and make sense of the underlying patterns.
“To see the earliest glimmers of everything from epidemics to genocide, to understand what makes us tick as humans, the narratives that shape our understanding of society, the emotions that capture how we're reacting to everything occurring around us. Our global dreams, our fears.”
It’s an incredible dream, but one that most would consider well beyond our current technological reach. However, Leetaru believes it is possible, and has dedicated the last 22 years working out how to, as he puts it, “essentially reimagine our world through data”.
“To find the subtle among the obvious at scales that no human could imagine. In short, to tell the untold stories from across the world,” he enthuses. “Data gives us an incredible new awareness through which to understand the world.”
Can we use big data to predict the future?
As ambitious of a goal as predicting the future is, Leetaru sees huge potential in its realisation.
“Imagine you wind the clock to the index case of the Ebola outbreak. Imagine if we could take those myriad signals that were emerging at the time, and there were a lot of signals, and make sense of that,” he says.
“We have so much data today, but we tend to be really bad at analysing it, making sense of that data. Imagine if we could have done that: imagine if at the time period we had been able to see those signals and put them into context. Could we have saved thousands of lives?”
However, this requires the ability to analyse current and historical data, identifying patterns in much the same was as Asimov proposed with psychohistory.
“If we want to try to make psychohistory real, if we want to try to bring Asimov's vision to life, you know really if we could take all this data we already have about society and make sense of that, could we actually do something like that?” he asks.
It turns out that we were able to forecast the Arab Spring and give ourselves Bin Laden's location to within 200km of where he was actually found
“Well in 2011, I published a paper called Culturomics 2.0 that showed that by data mining 100 million news articles and focusing not on the physical things there, but on the latent dimensions, things like the emotional undercurrents of that content, what locations were being talked about and in what context and how, it turns out that we were able to forecast the Arab Spring and give ourselves Bin Laden's location to within 200km of where he was actually found.
“In short, there's some incredible prescience in the data that surrounds us that only becomes visible when we start teasing apart those undercurrents at massive scale. “
This effort, however, was limited in scale in comparison to the vision psychohistory presents.
“In that particular piece I had to limit myself to whole country collapse because there simply were no datasets that focused beneath that. If I said 'give me a spreadsheet of all the labour protests that have occurred [in the] last 24 hours anywhere on Earth' that dataset simply didn't exist,” he explains.
“If I want to build a model that forecasts, for example, where we're going to see civil unrest around the world, I need to actually know where these things occur. And imagine if we could do that, imagine if we could take, for example, a labour protest is breaking out right now somewhere on Earth and put a dot on a map, and literally as it happens, to allow machines to essentially watch the world go by.”
Introducing GDELT, the project to use data to model the world
In 2013, Leetaru took a major step towards the realisation of this vision with the founding of the Global Database of Events, Language, and Tone (GDELT).
“[It’s] an open-data project designed to say 'well, what if we could take all of this open data that surrounds us and apply massive computing power and algorithms and mindsets to try to understand the world',” explains Leetaru.
A breathtakingly ambitious big data project, GDELT compiles data from 1979 to the present, with data at a daily level of detail available from 2013 onwards.
“Like any good data-driven project, it relies on a lot of data, so all the news media: print, broadcast, web in over a hundred languages. 65 of those I've machine translated,” he says, adding that GDELT also includes vast image datasets, adding that new possibilities are created by “combining textual narratives that have historically captured the world for us with these new visual narratives”.
“Being able to look at things like human rights through other types of very high-resolution data capturing specific dimensions, being able to look more creatively at things like television,” he says.
We've really reached the point where the data and the tools really aren't the limiting factors anymore
“Books allow us to look back across centuries, or academic literature: this is something oftentimes when we talk about the world we get too caught up in what's happening right now. We don't ask the more important question, which is why is this occurring? Why are these two groups fighting right now? “
The global database, however, is not just available to data experts like Leetaru. GDELT is available to access by anyone on Google Cloud Platform using the company’s web service BigQuery in a manner that requires relatively simple efforts on the part of the user.
“This is something that I had thought about for a long time, but never been able to execute because I didn't have the time to sit down and write the code that could efficiently go through all that material,” he says. “With the advent of tools like BigQuery, someone like me can come along and really with one line of code in two and a half minutes answer that question.
“We've really reached the point where the data and the tools really aren't the limiting factors anymore, where you can say 'I wonder' and I have an answer, and that's an incredible place that we're in today.”
A data visualisation created through GDELT. Images and video courtesy of the GDELT Project
Listening to the world: data beyond English
While previous efforts to model global attitudes using online resources, particularly Twitter, have focused on English, Leetaru considered the incorporation of many languages to be essential to his goal of truly modeling the world.
“Some people in Washington DC, where I'm from, a lot of people in Washington, seem to have this idea that everything on Earth is in English and there's no need to process beyond English,” he says. “And indeed the most of the tools that we have to do a lot of the data mining projects are designed only for English.
“But if we want to listen to the world we obviously have to look far harder than that.”
Where possible, this has involved the use of machine-based translation, a technology that while not by any means at full maturity, is sufficiently advanced to be useful.
“Machine translation is far from perfect right now, [but] machine translation today is actually good enough to give us a pretty good general gist for a good number of languages today,” he says.
“Honestly there's going to be errors in that, but it's good enough to monitor this material to tell, for example, this Arabic-language article is talking about a protest, a 100 people were involved, 20 people were arrested.”
We've really reached the point where the data and the tools really aren't the limiting factors anymore
At present GDELT completely machine translates all content that arrives in 65 different languages, with a further 35 hand-translated, at a frequency described by Leetaru as sporadic, by globally based experts.
And with such access to different languages, a far wider image of global events can be gleaned than through narratives only written in a single language.
Through this, the dream of making psychohistory a reality can begin to be realised.
“We can imagine this firehose of data pouring in, all of this material being machine translated. What can we do with that? How do we go from this firehose of translated material to something that allows allow us to do psychohistory?” he asks.
“Well we do two things for the textual data. One is physical events: so we want to take this article that talks about a labour protest and want to convert that from a textual article to a spreadsheet entry that says here's a protest that occurred right here, there's a location, number of people involved, why they were protesting, and then be able to connect that and say, well the police arrested 20 people and then in turn these people – this occurred and this occurred.
“So being able essentially to take the firehose and textual data, and represent that in a quantitative form that machines can actually process. But oftentimes we know what's physically occurring but what we care about is the emotions. How is the world reacting to North Korea right now? How are they internalising that? So the narratives and emotions that are within society are actually far more powerful oftentimes to understand society.”
With this combination of physical events and emotional data, the realisation of psychohistory begins to emerge.
“Once you have the stuff in codified form, machines can essentially watch the world go by for you, they can tell you that there's the first glimmers of instability in Burundi right now, they can tell you that five seconds ago two tanks just rolled across a bridge in Turkey,” explains Leetaru.
“They don't get tired and they can watch the entire world ceaselessly and give you those early glimmers to tell you something that a human would never have caught.”
A thousand words: including images in data modeling
While getting language right is clearly important, there is another important medium that forms a core part of the global data flow: images.
“In today's world, imagery is so powerful. Today I don't post a tweet and say 'it's a beautiful day outside', I post a picture of the cloudless sky. I don't text and say 'oh wow, there's protesters marching down the street'. I live stream it,” says Leetaru.
“Imagery allows us to capture the world and convey emotions in ways that no mere words simply can. This ability to see the world visually, to for example take half a millennium of digitised books, hundreds of millions of pictures of digitised books, and to render them in a collage that shows us year by year the global dreams and fears of human society.
“To be able to look back at the older world's fascination with the Danse Macabre or the power of death over life, to the 19th century's fascination with scientific progress and innovation.”
Of course, if making sense of multiple languages is complicated, it’s nothing compared to images.
“We've never historically had tools that can actually make sense of all these images, let alone the live deluge of imagery that surrounds us each day. So much of the computing era has been focused on text; imagery we sort of discarded,” he explains.
“In fact, when it came to books, almost all the work we've done to digitise books is to make the text searchable and throw away the imagery. What if we inverted that?”
The issue here is that the tools for analysing images have been extremely limited – that is, until recently.
“We really haven't had the tools. Yeah we've had tools that can tell you 'this image has a blob that maybe kind of looks like water, and it has a lot of red in the picture’, and sure there were tools that would give you a little bit more than that, but we've really reached the point in the last two years where the tools are robust,” says Leetaru.
We can take all the news imagery coming out of each country and measure the average emotion on the human faces
“Images are so powerful for the way in which they capture things, in a way that we've never really had before. And [for their] ability, for example, to map out emotions, so for example to use deep learning tools.
“This all comes from Google's Cloud Vision platform, which is their API that does incredible things with images. We can take those 300 million images and with a single line of SQL, we can take all the news imagery coming out of each country and measure the average emotion on the human faces of all those images over the past year and a half. And we can measure joy and surprise, sorrow.”
This ability to not only analyse and compare imagery by country but across the world unlocks unprecedented insights into the way our world is portrayed.
“We can take, for example, 300 million news images captured from news outlets all across the world from the last year and a half covering every country and ask questions,” he explains. “When we see the world through the lens of news media, what do we see? Do we see a sanitised version of the world devoid of pain and suffering? Or do we see a world that shows us what life is really like, that captures the world around us?
“To what degree does that news imagery humanise the world? Do we see whole imagery that just shows us landscapes and objects? Or do we really see the human impact of the global events that surround us?
“If you think about it, an image of a child running from a napalm attack became an iconic image that defined the Vietnam War to a generation in the United States, a ten year old's lifeless body on a beach really brought home the tragedy and the heartbreak of the refugee crisis to the world.”
Unprecedented global knowledge: the benefits of a data-modeled world
With GDELT providing so many ways to measure and track the data from news coverage and beyond, we are now able to monitor and measure the changing world around us, and even begin to realise the dream of psychohistory.
“For the first time we can actually ask those questions of what we're seeing when we turn to the world,” says Leetaru. “We can take the trillions and trillions of connections that capture all that data and represent that as a single massive network diagram that shows us the world and that really looks like a beautiful heart. But at the same time can be zoomed in and actually offer us actual insights about the influences that are shaping the global conversation in a particular sector. “
One example Leetaru gives is the refugee crisis, a rapidly changing event that GDELT could provide valuable insights into.
“To be able to track both where are those refugees going, how are they moving, but also the reaction to that. To trace, for example, refugees here are being welcomed with open arms; refugees over here are being pushed away,” he says. “But more importantly, to track how that's changing, to track the areas that have previously welcomed refugees and are now pushing them away.
“That becomes very powerful if we try to think about how the world's reacting to itself, how, essentially the world itself, human society as a whole is almost this living breathing creature, and to be able to capture that.
“We can look backwards in time to look at the root undercurrents of unrest, to look at the histories of countries and what has impacted, what has occurred there and tease apart these subtle patterns. And then look to the future and take those subtle patterns and use them actually forecast what is to come.”