News Media Icon News Media

In Election 2016, Why Big Data Took a Tumble

Brendan Dixon


The Presidential election results threw Silicon Valley off in more ways than one. Not only did the favored candidate fail to get elected, the surprise result called into question the fancy algorithms and reams of data that said she would. Many pundits wrote after-the-fact essays, but, one, at the New York Times, was insightful. The authors understood the import of the unexpected results:

Donald J. Trump’s victory ran counter to almost every major forecast — undercutting the belief that analyzing reams of data can accurately predict events.

The Internet has increased technology’s role in elections, both to improve the efficiency and efficacy of swaying voters, and to read the tea leaves of polls and sentiment. With nearly every major forecaster predicting a Clinton win, usually with odds of higher than 80 percent, her loss, and the inability of forecasters to see it coming, should rattle an undying faith in technology. As the authors note, the unexpected outcome calls into doubt the belief that we can, by harnessing immense quantities of data and grinding it through elaborate, complex algorithms and Machine Learning, know things otherwise hidden:

The election prediction business is one small aspect of a far-reaching change across industries that have increasingly become obsessed with data, the value of it and the potential to mine it for cost-saving and profit-making insights.

Yet the failure to predict the election’s results is only the most recent in a string of failures that should have taught caution. Two years ago Google Flu Trends — which used “Big Data” to predict expected flu cases — vastly overestimated the expected number. Microsoft embarrassed itself when its chatbot Tay turned its consumed data into sexist, racist rants, showing how easily data can lead systems astray.

Artificial Intelligence’s success, such as AlphaGo’s defeat of a world-ranked Go player — with Big Data and Machine Learning contributing significant breakthroughs — duped technologists into thinking we had found the key to unlocking the future. AlphaGo’s developers did indeed succeed by using Big Data and Machine Learning. They fed millions of games, first from prior championships and then from AlphaGo playing itself, through Machine Learning to create a system that successfully estimated the quality and usefulness of various moves. But what got lost in the hype-translation is the difference between winning at Go and predicting events like elections or flus. Go, in computer-science terms, is a “perfect knowledge” system; that is, everything you need to know to win the game is present in the game. Go presents no mysteries, no ambiguity, and no fuzzy conditions. Creating a computer to win at Go was hard because of the immense quantity of data: More valid Go boards exist than atoms in the visible Universe. Predicting elections is hard both because of how much data exists and — this matters more — the quality of that data.

Neil Postman noted decades ago that not everything of value fits into a computer. And fitting things into a computer may strip them of that which first made them valuable. Real world data, the kind useful in understanding who might win and who might lose, is replete with the fuzzy ambiguity that Big Data algorithms too easily scrape off as excess:

But data science is a technology advance with trade-offs. It can see things as never before, but also can be a blunt instrument, missing context and nuance.

The election failure of Machine Learning shows, again, that the underlying nature of machines — that answers must be yes or no, on or off, black or white — yields, not just unhelpful, but misleading results. How would the Clinton campaign, for example, have invested its efforts had it had realized the projected results were so far from correct? How would that have affected the eventual outcome? These questions, and others, matter as technologists push Big Data, Machine Learning, and Artificial Intelligence into domains once the sole province of humans.

The lesson to draw from the failure to predict the election is this: We must be wary of mindlessly trusting results from these machines. With problems whose data resembles a game — that is, the data is straightforward, lacking excessive ambiguity — the results of Machine Learning and Big Data will continue to improve. But for problems where the data involve nuance, judgement, and a careful reading of the clues, they will continue to produce misleading results, answers in need of an interpretation.

Human judgment and insight, especially trained human judgment and insight, are irreplaceable. We can handle the blurred edges and uncertainty that stumps machines. Machine Learning and Artificial Intelligence provide useful tools to assist with mountains of data, but they cannot replicate the insight of the human mind.

Photo credit: Gage Skidmore ( [CC BY-SA 2.0], via Wikimedia Commons.

Brendan Dixon

Fellow, Walter Bradley Center for Natural & Artificial Intelligence
Brendan Dixon is a Software Architect with experience designing, creating, and managing projects of all sizes. His first foray into Artificial Intelligence was in the 1980s when he built an Expert System to assist in the diagnosis of software problems at IBM. Since then, he’s worked both as a Principal Engineer and Development Manager for industry leaders, such as Microsoft and Amazon, and numerous start-ups. While he spent most of that time other types of software, he’s remained engaged and interested in Artificial Intelligence.



Computational SciencesNationNewsPoliticsscience