Election Data Science and the Death of Truth
The U.S. Presidential election is finally over. The protests are winding down, they’ve stopped burning cars in Oakland (for now), and the talks of California succession are waning. But I am struggling to return to “normal” because in this election, truth got hammered.
Many candidates treated opinions as “truth” and a large portion of the American public grabbed ahold of these “truths” as gospel. It may have been a good time to be in the “fact checking” business, but I’m not sure how effective even the fact checkers could be given the spontaneous nature of “opinions as facts” being thrown around, not mention the people who create fake news intentionally.
So let’s play a game! Let’s call this game “Separate The Truth From The Myths.” Let’s see how you do.
- Bat Boy Sighted in NYC Subway (probably too expensive to get a condo in Manhattan)
- Obama Appoints Martian Ambassador (but the Senate will request Matt Damon since he’s already lived and farmed on Mars)
- Skynet is a Reality (Hey, even Iron Man showed up at the Senate to tell them so!)
- Ted Cruz Shot JFK (okay, so it actually was his dad, but accusing Ted Cruz is more funny)
All but one of these stories appeared in the highly credible “National Enquirer” or “Weekly World News.” That’s like buying a copy of the “Mad Magazine” (for you old timers) or reading “The Onion” (for you young whippersnappers) expecting the “truth” from these satirical publications (see Figure 1).
However the below stories in Figure 2 where plastered across social media sites as if they were the truth, and as you can see from the engagement numbers, lots of people took the time to read these “truths”.
Data Science And Common Sense
As a data scientist, we need to know not to accept the “truth” without applying some common sense. For all the fancy training in neural networks, artificial intelligence and machine learning, it’s hard to replace “common sense” as a necessary data scientist characteristic. Let’s walk through an example of how a data scientist might approach one of the sensational stories that recently popped up on social media (see Figure 3).
OMG, murders are up 10.8% in the biggest percentage increase since 1971, according to a highly credible source like the FBI. It’s become the “Walking Dead” out there!
Sensational headlines grab attention and incite fear and dread. “Dirty Laundry” sells. But the problem with data at the aggregate level is that it:
- Distorts the real truth (or root cause) of what’s the problem, and
- It is not actionable
The above headline could lead to the conclusion that the current criminal and rehabilitation policies have failed and everything should be thrown out. But there are no details as to what aspects of these programs are broken and no triage of the root causes in order to explore what might be done to fix the problem. As a data scientist, one must demand the granular details so that we can turn the data into insights in order to make the information actionable, such as:
- NOTE: The homicide numbers were only available for 2015 since we are still in 2016, but for select cities, the numbers are only getting worse in 2016. For example through November 2016, Chicago has already had a 56 percent increase with 251 more murders in 2016 than 2015. http://www.chicagotribune.com/news/local/breaking/ct-chicago-violence-700-homicides-met-20161201-story.html
- Just ten large cities accounted for 524 additional murders, or ~33% of last year 1532 murders nationwide (https://www.theguardian.com/us-news/2016/sep/30/us-murder-rate-chicago-fbi-data-police). Here is the breakdown from this article:
This is a good starting point. If we want to address the increase in murders, we need to drill into each individual murder (and attempted murder) in those 10 cities. We need to keep drilling into the granular details in order to identify those variables and metrics that might be predictors of murders and attempted murders.
For example, we could identify the specific blocks of these cities where the murders are occurring, or the time of day and day of week, or the time of the year, or any special events that occurred right before the murders, or etc. We could explore other variables that might be indicative of an increase in murder (e.g., % of broken homes, % of children born out of wedlock, % of high school drops, % of drug addicts, unemployment rate among male adults, increase in graffiti).
Once we know those variables that are predictive of murders, then we have a focus as to where we can start fixing the problem, taking corrective actions such as adding more police or community outreach, reducing high school dropouts, increasing drug arrests, testing different programs and approaches, measuring program effectiveness, learning and improving. Now that’s thinking like a data scientist!
Data Scientist Lessons Learned
What are the lessons that we can take away from this “opinions as facts” syndrome?
- Common sense is critical. Don’t accept “truths” at face value. Demand more details in order to identify and quantify those variables and metrics that might be predictive or indicative of the researched problem.
- You can’t fix the business – or the country – without drilling into the details and the potential causal factors. We need insights that are drawn from facts that are supported by granular data so that we know what actions to take. With these detailed insights in hard, we now know where to invest our scarce financial and human resources.
- Details matter. At the aggregate level, the headlines may be sensational, but it is not insightful or actionable until you get into the details. Remember Simpson’s Paradox!
- Data quality, accuracy and reasonableness are important, especially if you are trying to make business-impactful decisions based upon that data. Business users, if they are expected to use the data to support decisions, must have confidence in the data. “Facts as Facts” are critical if we want to overcome decisions being made on traditional basis such as gut, hearsay and history.
The good data scientists learns not to trust anything at first blush; that while opinions might yield variables and metrics that might be better predictors of performance, in the end the data scientists need to validate each of these variables and metrics to quantify if they really are better predictors of performance.
In the movie “Star Wars: The New Hope,“ the weak-minded Storm Troopers were easily dissuaded from pursuing the truth about the droids by Obi-Wan Kenobi’s use of the Jedi Mind Trick to plant the “truth” in their weak minds.
Don’t be weak-minded about seeking the truth. Use your common sense to challenge the “truth”, and get into the granular details so that one can identify and quantify those variables and metrics that are better predictor or indicators of the problems.
And beware the “These aren’t the Droids you’re looking for” syndrome. That’s for the weak-minded.