Archive for data

Religious Outliers Nonsense (or "Atheists Are Richer Than Religious People If You Take All Poor Atheists Out Of Your Sample")

Charles Blow’s most recent New York Times op-ed is something of a boon for visualization enthusiasts. He replaces almost his entire article with a visualization. This illustrates that he recognizes power of visual communication to make and reinforce a point in a way that is self-obvious and can stick with the reader better than words.

Unfortunately, he has decided to use data that misleads his audience to such an extent that I can only conclude that he is unconcerned with the truth insofar as it undermines his desired objective.

Blow’s main point is that the US is an outlier in the world because we’re religious but also rich while “religiosity was highly correlated to poverty”.

I’ve reproduced the chart in question below. (Click to enlarge)

image

Now, keep in mind that this is not charting religion as it is listed in the CIA World Factbook, but according to the specific question: “Is religion an important part of your daily life?” That will be important in a little bit.

This chart seems to prove his point. Until you realize what isn’t on the map.

Here is a list of the countries that didn’t manage to make their way onto the map due to the fact that Gallup didn’t poll them:

China – 1.33 billion people, heavily non-religious, poor

North Korea – 22 million people, heavily non-religious, unbelievably poor

Cuba – 11 million people, presumed non-religious, poor

Taiwan – 23 million people, 93% Buddist*, rich (comparable to Japan)

Problem number one – Charles Blow has a duty to inform his audience of these omissions. The countries without data represent nearly 25% of the world population and skew heavily toward non-religious. They are too large and too important to the data set and visual reference to simply ignore. Yet Mr. Blow doesn’t seem interested in mentioning them.

Problem number two – Mr. Blow heavily implies that there is a causal relationship between religiosity and wealth. But (as we all know) correlation doesn’t imply causation. Western European countries (and countries filled with people from Western Europe) are richer, as are developed Asian countries. Eastern European and South American countries are less rich. Middle eastern, and African countries tend to be much poorer. There’s a correlation in geo-political histories here that is stronger than religion.

Of course Mr. Blow could always go to rural India and inform them that their poverty is related to their devotion to Hindu and has nothing to do with British imperialism. Or perhaps to the deep south where he can proclaim to the +90% Christian black population that their economic woes are related to their religious tendencies.

Problem number 3 – But the final problem is the worst one because it involves an outright lie:

Singapore is more religious and richer than the United States. And Mr. Blow didn’t map it. At all.

It’s possible that Mr. Blow is actually so numerically illiterate that he didn’t know he was supposed to tell people about key missing data points. But taking out data that doesn’t align with his point is disgusting manipulation. The end result of his deception (conscious or otherwise) is “If you take out all the poor atheists and take out all the rich religious people, then this pattern emerges…”

Mr. Blow should put Singapore back in to the data set and add a correction to his article that announces how his data set has enormous gaping holes. And he should probably never be allowed to touch charting software again.

* The CIA Factbook has Taiwan listed at 93% Buddhist, but I’m not sure how they would answer the specific question that Gallup asked. I’ve heard some atheists claim Buddhism as an “atheistic religion” (no personal god) so it could be that the citizens of Taiwan wouldn’t say that religion plays a big role. I simply don’t know.

Dean Baker Completely Unaware How BLS Calculates Unemployment Rate

Holy crap, how did Dean Baker from the Center for Economic and Policy Research get a job writing about economics?

Check out this line of his report on the recent unemployment numbers:

The unemployment rate fell to 9.7 percent in May, primarily as a result of 411,000 temporary Census jobs.

Mr. Baker is apparently unaware of how we calculate the un2employment rate. Let’s help him out here.

First we take the number of people who have jobs. This is not the “non-farm payrolls” number, which increased by 431,000. It is the “Employed, 16 years and over” number which decreased 35,000 (from 139.455 million to 139.420 million). Mr. Baker seems to have those two numbers confused, so I thought I’d clarify.

Then we take the number of people who are looking for jobs but can’t find them. This is where we get the “unemployed” number, which decreased from 15.260 million to 14.973 million.

Then we add the employed number to the unemployed number and you get the Labor Force number. In order to calculate the unemployment rate, we divide the number of people unemployed from the labor force.

139,420,000 + 14,973,000 = 154,393,000 people in the labor force

14,973,000 / 154,393,000 = 9.7% Unemployment rate

Let’s try to prove Mr. Baker’s statement that the unemployment rate dropped to 9.7% “primarily” due to the 411,000 census jobs. We’ll subtract 411,000 from the “employed” number.

139,009,000 + 14,973,000 = 153,982,000 people in the labor force

14,973,000 / 153,982,000 = 9.7% Unemployment rate

We get the exact same unemployment rate with or without the census jobs. That is because unemployment rate dropped due to people leaving the labor force. And most of the people who left the labor force came from the “unemployed” category. Otherwise known as “discouraged workers”.

This is part of the reason that economic understanding is so dismal among the general public. An economic reporter should be able to get the simple facts right about a job report.

Read My Chapter From Beautiful Visualization For Free!

OK, Cory Doctorow, let’s see if this works.

I’ve recently written a chapter on storytelling and information visualization for O’Reilly media to publish in the book Beautiful Visualization.

And you can read it for free.

You can buy the e-book or per-order Beautiful Visualization directly from O’Reilly media or pre-order a hard copy of Beautiful Visualization from Amazon.com

Quick Summary

Information visualization is a powerful medium for the communication of data, but one of the things that makes it so powerful is the fact that it plays a part in the larger story of reality. Information visualizations can tell the part of the story that is data heavy in a way that is gripping and memorable. The first half of my chapter walks through the process of creating a visualization and the importance of anchoring that visualization in the context of a larger story (including some thought-experiment examples).

The second half of the chapter walks through the entire process of creating an information visualization, from data gathering to munging (sorting, filtering, re-aligning) a large data set, to visual representation. This visualization is done using Microsoft Excel (you could use Google Docs, which is free) and Adobe Photoshop (you could use GIMP, which is free).

If you like my chapter, you will love the whole book. Beautiful Visualization is twenty chapters of unadulterated awesome visualization tips, tricks, discussions, and lessons from scientists, artists and visualization experts. Once again, your purchase options are to buy the e-book or per-order Beautiful Visualization directly from O’Reilly media or pre-order a hard copy of Beautiful Visualization from Amazon.

Finally, if you’re one of those people who is constantly thinking “I really wish I could financially support someone who is soooooo awesome that they put their professional work online for free”, today is your lucky day. You can donate whatever you would like directly to me as a thank you via PayPal.


But, most of all, read my chapter! Comment on it, tear it to shreds, pick out the good parts, pick out the bad parts, print it out and light it on fire. I hope it will be helpful in whatever you’re doing.

Debunking the “Republican Congress Creates Jobs” Chart Or “How To Make Numbers Say Anything You Want”

This is a companion piece to the previous post, so please read both of them. Here I’m going to lay out the script I had written for debunking the chart I created that asked the question “Does a Republican Congress Create More Jobs?” and then implied with a chart that this was indeed the case. I’ll walk through some process for creating charts and then talk about why I would create a chart that I was just going to debunk.

I apologize for the similarity to the post where I debunk the Obama stimulus chart. These two scripts were meant to be together.

<Start Script>

How to Make Number Say Anything You Want

Do you want to convince people that your side is right with only the flimsiest proof? Does the idea of tricking people with numbers make you all happy inside? Then come join us as we walk through “How To Use Charts To Say Anything”

Step 1: Massaging the Data

The first step is to grab the data that makes your point the best. Let’s use it to prove that a Democratic Congress is bad for jobs.

“How can we do such a thing” you ask?

In the first case, the raw jobs data looks like this

but the final chart looks like this.

How did they do that? Was it magic?

Nope, we simply smoothed the data. The raw data is a little too chaotic and has too many data point to tell the straightforward story that we want. So instead, we’ll average the monthly data so that we have quarterly data. There… now we have some nice smooth straightforward data

Step 2: Pick colors that make you look good

Next, we pick some colors. Let’s make the Democrats blue dark and bold, give it a bit of an angry feel to it. This is our way of getting the audience to look at the democrats in a harsh way. We could try to soften up on the Republicans more, but too soft of a red would look pink and we don’t want that.

Let’s compare our colors to the Excel defaults:

Step 3: Do NOT give any context!

Finally, and this is the most important part, only give information that is helpful.

Let everyone know that we saw 8 million jobs added to the economy while the Republicans were in charge and make a point to show that we lost 8 million jobs while the Democrats were in charge. But don’t mention that the Republicans took Congress only a year after 9/11 at a time when the job market was particularly low. Otherwise people will think it’s a “Well, they can’t fall off the floor” thing.

And make sure you don’t mention anything about the real estate market and how the bubble drove the labor market in a way that was clearly unsustainable. We don’t want the viewers to be confused with all these relevant details. We want them to say “Republicans good, Democrats bad”.

<End Script>

Everyone here was incredibly kind to put up with my bullshit chart for as long as I left it up without explanation. I’d like to say unequivocally: My chart is propaganda… just like the Obama administration’s chart. I was trying to use my chart as a visual talking point that said:

If you have no ethical qualms, data visualizations can be manipulated to say exactly what you want them to say.

My chart implies that the Republicans were responsible for the jobs growth between 2003 and 2007 and that Democrats were responsible for the drastic decline from 2007 to the present. Let me state plainly, I do not think that is the case.

But if we just play around with the data the right way, we get what seems to be a clear picture that portrays a correlation and gets on its hands and knees and begs us to draw causation from it. Most people will do exactly that.

I can spend hours walking patiently through what is wrong with the Obama administration’s chart. Let me recap the high points here:

  • If you look at the data with the context of what President Obama’s team was hoping the stimulus would do, the power of the chart disappears.
  • If you look at the data with the understanding that they’re charting a first derivative, you realize that we haven’t gained jobs, we’re just losing them more slowly and the power of the chart disappears.
  • If you look at the data with the understanding that they didn’t even start spending the stimulus until the job loss had started slowing down, the power of the chart disappears.
  • If you look at the data in the context of other recessions, you’ll realize that, far from showing a drastic improvement, the numbers represent a devastatingly slow jobs recovery compared to other recoveries and the power of the chart disappears.

But this kind of explanatory rebuttal would interest those already convinced. The chart I made had a power that an calm explanatory video wouldn’t have. Quite frankly, I hate that this is the case. Like President Obama’s chart, my chart doesn’t teach people anything about economics or lead people to learn important things about unemployment.

The only valuable thing my chart teaches is that charts can portray accurate data and still be manipulated to coach people along to poor conclusions. The only reason I even put my chart up is because it is the graphical equivalent of drawing out the Obama administration’s argument to its logical conclusion. My chart works with the same data, the same assumptions, and the same implications. And it leads to a completely different conclusion.

I’ve heard people describe President Obama’s chart as “powerful” and “brilliant”. The popular information visualization blog Flowing Data even tossed it up for public discussion among info viz professionals.

My point here is that it isn’t brilliant. It’s juvenile. It’s the chart equivalent of a crass political cartoon with a Snidely Whiplash mustache drawn on the bad guys. It’s a design trick imagined by cynical, self-congratulatory children fresh out of graduate school who pat themselves on the back for their ability to fool people who they think are too stupid to know the difference. They think they are special because they can get powerful people to flatter them for their ability to lie.

But they aren’t special. I can play that same childish game in my free time. The difference if that I want people to know that it’s a trick. They would rather see people fooled.

Glenn Beck Tries to Duplicate My Visual, Messes Up The Math

Last week, I posted a new video on the recent budget freeze using colored cups of water (seen here).

The following Wednesday, during the morning Glenn Beck radio show, Glenn was introduced to my work. Apparently he liked it so much that he had his own version by the evening.

I made a lot of noise on Twitter about him taking my video, but that was because I thought he actually took my video as opposed to translating it into a similar idea. Taking my idea… who cares? I’m hardly in this for the money; if people understand something better than they did before and they were true to the data, I’m happy.

But that was the problem: I don’t know who was doing the math for the demonstration, but it was way off.

Let’s assume that the 100 gallons of water represented the spending over the next 10 years.

The reason we’re making this assumption is not because it makes sense but because we’re giving Glenn the benefit of the doubt. (Glenn implies that we’re looking at the budget for 2011, but he never says that so I don’t want to lock his meaning into something he might not have meant.) According to President Obama’s 2011 budget, we expect to spend $45.9 trillion from 2011 to 2020.

Let’s also assume that Glenn is using the “$250 billion saved over 10 years” number to represent the amount of money saved. I assume this because that’s the only number that I’ve seen that is “over the next 10 years”. If this is the case, then Glenn says that what looks like a shot glass (about 2 ounces of water) represents $250 billion.

I don’t know who did the calculations, but they got it pretty far off. If $45.9 Trillion is equal to 100 gallons, then $250 billion is equal to a 2 liter bottle of water.

That’s a lot of water to chug and not nearly as impressive a visual as the little shot glass. But it is accurate.

Like I said before, taking my idea is fine if you think it helps other people understand something better. But maybe next time someone should drop me a line to make sure you get your numbers right.

January Employment Numbers or How To Lose 1.1 Million Jobs But Keep The Unemployment Rate The Same

It’s that time again! First Friday of the month and job numbers for the previous month are released.

This month’s unemployment rate is (drumroll please…)

9.7%!

That’s a 0.3% decline in unemployment. But, if you’ve been reading this blog long enough you know that we can’t just let that go without caveats.

The Good Stuff

Good news is a matter of context. This news isn’t so much good in the sense that the economy is super awesome so much as it is good in the sense that it isn’t actively bad. But I’ll take “not actively bad” for now.

  • My big “thing” for the last half-year is that the nature of the unemployment rate calculation is hiding a huge story, which is the disappearance of the labor force. This latest report has the labor force increasing for the first time since August.
  • Actual employment increased a little over half a million (although comparing January to December is difficult because the population controls are altered at the beginning of every year). This is the biggest increase in approximately forever (since 2007).
  • The people who want a job but aren’t looking for one declined about 350,000.

The Bad Stuff

The good news on the job numbers is basically that they don’t suck. The bad news is that they really only look good when compared to December. Compare them to further back and they still look like of questionable.

For example, the last time we had 9.7% unemployment (back in August), we had 1394 million jobs. In January, we had the same unemployment rate, but only 138.3 million jobs. In 5 months, the unemployment rate is the same, but we’ve lost 1.1 million jobs. Mathematically stated:

August, 2009:
15.0 million people looking for jobs
divided by
154.43 million people employed or looking for jobs
equals
9.7% unemployment (139.4 million actual jobs)

January, 2010:
14.8 million people looking for jobs
divided by
153.17 million people employed or looking for jobs
equals
9.7% unemployment (138.3 million actual jobs)

Welcome to the world of job statistics. Basically, the unemployment rate is a measurement of people with jobs vs. people who are looking for jobs. If people stop looking for jobs for whatever reason, they move out of the “unemployed” category and even though a new job has not been created, the unemployment rate has gone down. Incidentally, I have a video on this phenomena that I made back in October (aired in November).

The Weird Stuff

You may have seen a headline along the line of “Payrolls Decline 20K as Unemployment Rate Falls to 9.7%” and said, “What the…?” This is basically the messy nature of statistics poking its ugly head up through the facade of mathematical certainty.

More simply stated: These numbers are estimates based on a set of surveys. The surveys are gigantic (tens of thousands of respondents) and employ certain controls to try to give a more accurate picture of what is going on. But there are two different surveys: one for household data and one for establishment data. Usually they show a similar picture. Right now they are showing a different one. The household survey says we have more jobs than last month while the establishment survey says we have fewer.

These numbers are not now, nor have they ever been, exact. But they’re good enough for government work (get it!).

The Political Power of Data Visualization

It’s funny… I got my start in information visualization with an Ignite presentation on data visualization about a year ago. (For those who don’t know, Ignite is a get together of geeks and artists where people give a 5 minutes presentation on something they love with 20 slides where the slides auto-progress every 15 seconds.) About a month later, I posted my first political infoviz and that started the hoopla that led to this blog.

And I just found another fantastic Ignite presentation on “The Political Power of Data Visualization“. This one is done by Alex Lundry (Twitter: @alexlundry), who actually works in the field with Target Point Consulting. (I don’t work in the field… I’m just a hobbyist.) That means if you want Info Viz done for your company or think tank, you should go to him.

Alex makes the exceptionally important point that visual communication and thinking (especially with info graphics) is THE wave of the future for policy communication. Let me be as clear as possible about this:

If you talk, you lose. If you show, you win.

Period. Showing people something makes them feel like they discovered it. It is THE way to convince people.

Visual thinking. Infographics. Data visualizations. Make them. Use them.

By the way, here’s my original Ignite presentation for anyone interested (skip the first 20 seconds, it’s embarrassing).

Visualizing the CRU E-Mails

Very cool visualization of the Climategate e-mails over here. For more information see the Computational Legal Studies blog post.

Additionally, they have hub and authority scores for the authors of the e-mails. I like.

Thanks to Pankaj Gupta and Drew Conway for pointing me to this.

ClimateGate: Free The Data

I wanted to get this out because I’m quickly becoming consumed with other things. But I’ve been following the ClimateGate scandal for coming up on a week now. And every time I turn around it looks worse for anthropogenic global warming.

For those of you who don’t know what I’m talking about, here’s a quick summary:

Someone stole (or possibly leaked) a ton of files and e-mails from the Climate Research Unit

My position on climate change has heretofore been: “I’m not a climate scientist, but there seems to be a pretty significant agreement among those who are that the main points of climate change are solid. The earth is warming and humans are causing it to some degree. The extent to which humans are causing it (do we account for 90% of the change? 50%? 30%?) and what to do about it seems to still be a matter of debate. ”

I’ve read a number of the journal articles on the matter just because I’m interested enough in what is going on and my inclination is to get as close to the data as I can.

Because that’s my thing. Data.

Everything about data is vital to the scientific process. How we collect it, how we analyze it, how we compare different sets… these things are desperately important to good scientific work. When data gets too big, we use statistical analysis to understand it and models to predict what will happen next.

Most importantly, for science to work we need people to check our work. The next scientist down the line should be able to work his way to the same conclusion in order to be able to rely on moving toward the next conclusion. Verification is the heart and soul of the scientific process.

And the process is more important than the result. If you don’t believe me, go read up on Fermat’s last theorem. Pierre de Fermat made a conjecture in 1637 that turned out to be true, but mathematicians couldn’t prove it for over 300 years. That the conjecture was true is important, but how we know it is true is the key part.

That is why I am so pissed off at the scientists at CRU. If you read their e-mails (a good collection of what they say has been collected by Bishop Hill), they spend a ton of energy making sure other people can’t do independent verification of their data. They attack people who disagree with them, not because those people have bad data or use poor process, but because the results are not consistent with the message the CRU scientists are trying to propagate.

Add to that the fact that the CRU e-mails reveal an almost violent disregard for proper scientific peer review in favor of bullying journals into accepting only appropriate papers. And they make no bones about it: Appropriate is defined in relation to the desired result. If the result is different from what they want to hear, they worked tirelessly to politically punish people who found those results.

And we haven’t even started talking about the code.

I have a solution to this, one that I believe is non-partisan and vital to future work:

  • If a paper is going to be referenced in an IPCC report, they need to post their all the data, an explanation of the process and the code for the paper where anyone can look at it and verify it.
  • Any grants that are offered with federal money should require public access to the data, the process and the modeling code. If “the people” bought the research, we should be able to look at it, not just at some 10 page summary report.
  • Any paper used for public policy purposes should hold the same requirement.

In short, this is a call to free the data. We can’t make decisions in the dark. If these guys have done good science, anyone with an appropriate expertise will be able to verify it.

Is this unfair to climate scientists? A violation of intellectual property?

Forgive me if I don’t give a sh**. These guys have crapped all over the scientific method and made a mockery of objective science. This kind of bad PR will take years, possibly decades, to overcome. If they want to keep their data to themselves, they can get a private firm to support their research and stop using their findings to push public policy.

Take note: This does not mean that the conclusions the CRU scientists have come to are wrong. They could be 100% right and still be huge assholes who want to hide their data from everyone else. But we have no reason to believe that they are 100% right because we can’t see the data and we don’t know their process. Just because you cheer the deaths of your opponents doesn’t make you wrong. In the future it’s going to take more to convince me than “But the scientists SAID SO!”

Also, given the blatant and horrific way in which these people have manipulated the peer review process, the “But the skeptics aren’t published in peer reviewed journals” argument is a pretty sh***y line of attack from here on out. Just from reading the e-mails, we can see that:

  1. That isn’t even remotely true
  2. Manipulation of the peer review process has been a top priority for these scientists, to the point of intentionally ruining careers and lives.

From here on out, they can have my confidence in their results when I see their data.