Archive for May 28, 2010

Government Spending Visualization Misses The Mark

UPDATE: Wes at Pitch Interactive has left some comments with additional information on the data and visualization. I don’t agree with his opinion on the issue of contract spending (Does the federal government spend a disproportionate amount of defense? I don’t think the data supports that, but it depends on what your opinion of  “proportionate” and “appropriate” is.) , but you should definitely read his comments for a more complete understanding. He’s an excellent example of the government data transparency that we both endorse.

In the recent Design For America competition, a tie for first place was this very attractive visualization of Federal Spending.


When the image won the contest, it was listed as a visualization of all federal spending. After a back and forth, the author at Pitch Interactive changed the title to “Federal Contract Spending” and has stated that he will revisit the visualization so that it shows all federal spending as it is reported at Pitch Interactive has gotten beaten up a great deal over this visualization and they have been nothing but gracious throughout. So I just want to take a moment to say that I think their work is remarkable and that the problems with this graph are a series of very honest mistakes.

But one of the things my blog does is point out mistakes to increase understanding.

My biggest problem with the image is that it still perpetuates the stereotype that the federal government spends most of its money on defense. This image in particular drives that point home by ranking the spending areas according to their “media coverage” ranking where we can see the extent of media coverage each department saw (based on the New York Times API). “Defense” reporting is clearly out of proportion to Defense spending.

The first problem has been addressed elsewhere… it’s the issue of scaling the radius instead of the area of the circles. If the numbers were a correct representation of federal spending (more on that later), the circle visualization commits this “radius is not equal to area” visual error that really bugs me. I even gave it a couple pages in my book chapter (now available online for the low, low price of free) and mentioned it in my Microsoft talk on visualization because it is such a common mistake.

The other problem lies in the fact that, rather than being a visualization of federal spending, it is a visualization of federal contracts. If we use the graph below as a visual of government spending (taken from the graph above tracks only the dark green parts of the spending line.


As you can see, this kind of visualization gives a very false impression of spending because Department of Defense spending is run almost exclusively off of contracts while Health and Human Services (which actually spent MORE money than the DoD due to the fact that it distributes Medicare and Medicaid) looks like a tiny fraction. The most expensive department, the Social Security Administration, doesn’t even show up in the visualization due to the fact that the money is all direct payments.

The reason this bothers me as much as it does is because the point of a visualization is to clarify and inform. One of the biggest pieces of MIS-information surrounding the federal budget is the idea that Department of Defense spending accounts for the majority of all spending. The reality is that Defense spending is about 17% of all federal spending (42%, if you only count discretionary spending and completely ignore Social Security, Medicare, Medicaid and interest on the national debt).

The original visual does the opposite of clarify and inform… it reinforces the misconception. The area that represents Defense spending is no less than 84% of total visual area! This isn’t just inaccurate, it’s exceedingly, painfully inaccurate. And, worst of all, it is inaccurate in a way that people will see it, allow it to reinforce their wrong perceptions and think that they know the truth.

But I’m a little bit shocked that the Sunlight Foundation didn’t catch these errors. It is clear to me that when Pitch Interactive gathered the data, they thought they were pulling ALL the federal spending and built the visualization off of that understanding. But Sunlight is supposed to be all about federal data. Anyone with even the most casual familiarity with the government spending data would immediately see that this visualization was in error.

Finally, it’s only fair that, after this criticism of this piece, I offer what I think is an accurate representation of the data. So I’ve re-built this visual with all the spending data and taking into account all the issues I’ve noted. Here is the fixed version of the graph (click for the large version).


Read My Chapter From Beautiful Visualization For Free!

OK, Cory Doctorow, let’s see if this works.

I’ve recently written a chapter on storytelling and information visualization for O’Reilly media to publish in the book Beautiful Visualization.

And you can read it for free.

You can buy the e-book or per-order Beautiful Visualization directly from O’Reilly media or pre-order a hard copy of Beautiful Visualization from

Quick Summary

Information visualization is a powerful medium for the communication of data, but one of the things that makes it so powerful is the fact that it plays a part in the larger story of reality. Information visualizations can tell the part of the story that is data heavy in a way that is gripping and memorable. The first half of my chapter walks through the process of creating a visualization and the importance of anchoring that visualization in the context of a larger story (including some thought-experiment examples).

The second half of the chapter walks through the entire process of creating an information visualization, from data gathering to munging (sorting, filtering, re-aligning) a large data set, to visual representation. This visualization is done using Microsoft Excel (you could use Google Docs, which is free) and Adobe Photoshop (you could use GIMP, which is free).

If you like my chapter, you will love the whole book. Beautiful Visualization is twenty chapters of unadulterated awesome visualization tips, tricks, discussions, and lessons from scientists, artists and visualization experts. Once again, your purchase options are to buy the e-book or per-order Beautiful Visualization directly from O’Reilly media or pre-order a hard copy of Beautiful Visualization from Amazon.

Finally, if you’re one of those people who is constantly thinking “I really wish I could financially support someone who is soooooo awesome that they put their professional work online for free”, today is your lucky day. You can donate whatever you would like directly to me as a thank you via PayPal.

But, most of all, read my chapter! Comment on it, tear it to shreds, pick out the good parts, pick out the bad parts, print it out and light it on fire. I hope it will be helpful in whatever you’re doing.

Sunlight Foundation Design For America Winners

The Sunlight Foundation contest “Design for America” has announced their winners. To my disappointment, Recovery Review was not counted among the worthy, but you should check out the winners.


Congratulations to all of the winners!


Redesigning a Government Website WinnerIRS Re-Design


How a Bill Becomes Law WinnerHow Our Laws Are Made


Senate Rules Visualization WinnerGuide to US Senate Floor Procedures



Health Data Visualization WinnerCounty Sin Rankings



US Spending Visualization Winner (Tie)Spending vs. Media Coverage and Is Washington Bankrupting America



Best Design of Sunlight Data Winner (Tie)Cool Kids at the White House and Who Paid Them


image Visualization WinnerMaking a Full Recovery


The CBPP Economic Downturn Chart is Deeply Dishonest

I was reading Jonathan Chait (who is really smart) the other day and I came across The Stigmatization Of Bush-Blaming.

A quick summary of the post: It’s right and proper to blame Bush for all the economic and federal deficit unhappiness because it’s really and for true all his fault. But if liberals continue to do it, it sounds whiny petty. So liberals can’t say the truth because it is politically non-viable.

An compelling thesis to be sure, especially when it is supported with this chart that Chait pulled up.

And so, as I am wont to do (and apparently Chait is not wont), I looked for the origin of that chart and found this page from the CBPP (Center on Budget and Policy Priorities) that explains the chart.


I would like to decry the practice that I’m am hereby dubbing “source masturbation”. It is when you link to yourself to support your thesis and bury the actual source 2-3 links deep. For example, writing a blog post where you link to a post you wrote that links to a report you put out and the original source is in the footnotes. Media Matters does this kind of thing constantly and I find myself just doing a Google search of what they’re talking about rather than try to follow the links.

This post from the CBPP is actually worse… they link to the footnotes at the bottom of the page which then link to the footnotes inside the report… the ones you just clicked on to get to the bottom of the page! Do they link at any point to anything outside their own website? No. No they do not.

This is the internet, people. Use it.


Moving along.

The problem with this chart is that it implies that:

  • the economic downturn is Bush’s fault and will continue to be Bush’s fault
  • the tax cuts will extent into 2019 and they will be Bush’s fault
  • the Iraq and Afghanistan wars are Bush’s fault.

Let’s start with the last one first. I don’t want to argue the Iraq and Afghanistan wars because it will run something like this. “You’re stupid and a jerk.” “No, you’re the stupid jerk.” “You want to murder babies.” “No, you want to let other people murder babies.” “Why didn’t you go into the Army, you chickenhawk?” “You don’t know anything about me. My sibling/friend is over there right now.” “I hope your sibling/friend is happy killing babies.” (I actually had someone say that last one to me. Dear anti-war people: you may have some valid points, but a lot of the guys on your side are complete a**holes. A**holes don’t persuade very well.)

See how much helpful ground we covered? Let’s just say for the sake of argument that Iraq and Afghanistan is all Bush’s fault and President Obama would get out of there if he could, but he’s trying to be responsible given the situation he found himself in. OK? So 10% of the deficit is due to the wars. What about the remaining 90%?

In this chart, the big item here is the Bush tax cuts. The problem with this analysis is that these tax cuts expire this year. That means President Obama will have to sign them back into law if they are going to continue to be a major factor in the budget deficit. After this year, if they are still in effect, they will be the Obama tax cuts, not the Bush tax cuts.

Furthermore, by the estimates that President Obama’s economic team have come up with, letting these cuts expire will bring in $700 $850 billion over 10 years. (Thanks to John below for pointing out my discrepancy.) Which is like saying that letting these tax cuts expire will pay for the stimulus last year.  By the CBPP analysis, getting rid of the Bush tax cuts would bring in… wait for it… $4.4 trillion over the next 9 years. Which is like paying for the stimulus 6 times over.

But if you look at President Obama’s budget, you see that they assume (with the typical optimism associated with any given administration’s budget report) that, with the tax cuts having expired, we will have a $900 billion deficit in 2019. By contrast, the CBPP assumes that, if we let the tax cuts expire, we would have a deficit of $630 billion.

So, putting aside the tax cuts issue, let’s address the economic downturn issue. CBPP assumes that the downturn is Bush’s fault (naturally) and that President Obama can’t do anything about it. They basically say that the “lingering effects” of the downturn (like interest on the debt we accumulated during that period) should also be blamed on Bush.

This strikes me as uncompelling. This is basically a blame shift game that isn’t appropriate for adults. Imagine if a conservative think-tank created a chart in 2008 which we could see how much of the deficit was due to Medicare/Medicaid/Social Security/Other welfare programs. Or what if we assumed that, if Jimmy Carter had never been president, we would have avoided double digit inflation and the following recession and we extrapolated out those benefits to today?

First, those issues are counter-factual. We can’t know the present based on a theoretical past. (This is the first thing that Chait should have noticed. He’s an editor, not an economist, but he should have seen this very plain logical fallacy for what it is.) The President doesn’t get to pretend that he is dealing with part of the fiscal mess and ignore the rest of it because it’s “not his fault”. The President deals with the reality placed before him. Anything less is juvenile posturing.

Imagine that someone inherits a house. The house is seriously messed up due to the poor management of the previous owner. Cracked walls, rotting floorboards, leaky roof. The new owner comes in and promises to fix it up. It is probably unfair to complain a year later that everything isn’t perfect. But it is childish for the owner to ignore the problems by saying “That was there when I got here.”

Everyone knows there were problems. But by inheriting the house, the problems belong to the owner. The CBPP report is a way of posing a counter-factual scenario and saying “These problems don’t belong to President Obama.” I find the whole process absurd.

But the most egregious error (in my view) is the chart. It seems clear to me that the CBPP would rather build a chart to rest the blame as squarely as possible on Bush’s shoulders instead of building a chart that will increase understanding of the issue. They’ve sadly suckered Chait into this nonsense with a chart that tells him something that he might really like to be true, but that doesn’t very closely tack to fiscal reality.

Recovery Review Is Fully Live

Seriously, I’m going to get back to posting regularly now that this project is complete.

On Monday, I submitted Recovery Review for the Sunlight Foundation’s Design for America contest. (The project requires the installation of Silverlight if you don’t already have it installed.)

Recovery Review allows users to search and visualize stimulus data. It also allows user to flag data that they think is inaccurate. I think it’s a pretty cool little project, although I have a small list of things I’d like to improve about it. (The list isn’t comprehensive, but it’s a start.)

I’ve also started a blog for Recovery Review to go along with the project. Right now, the blog has some discussion of design decisions and the stimulus data.

One of my biggest frustrations is the fact that the data was updated on the website when I was right in the middle of my project. As a result, I’m running the project (at least until the judging is complete) on the old data, which represents everything up to 2009, Q4.

What is strange to me is that it looks like the data updates are a little frustrating. Anything about a particular project can change in the updates, from the amount of money awarded to the project to the date the project was started to the number of jobs the project “saved or created”. Sometimes these changes make sense. Sometimes they make no damn sense at all.

It looks like I picked a hell of a complex data set to work with.

Introducing Recovery Review (Alpha)

I’d like to apologize for something and then give a good reason for it.

The Sunlight Foundation is a fantastic organization that pushes for government transparency and every once in a while, they run a contest. This year, the contest is “Design for America“. It started in early March is meant to be a 10 week design contest with several categories for entries.

When the contest started, I didn’t think I had time to build the project I wanted to build because I had a major professional conference in April. But after the conference ended, I decided I might still have time to hack something together. And hack I did.

My project is called “Recovery Review” and is meant to be a way to crowd-source the task of checking the stimulus data.

Users can search through the stimulus data given a couple key variables and get a report of the stimulus projects that match their search. They can then expand the item to a full view (the “+” icon in the top right corner) and then flag the item if it has any inaccurate or questionable data. They can also add a link if there is a news article or blog post discussing that specific project.

So, please, be my test users if you have some time. Head over there and look through the data. If anything in the data seems inaccurate to you, flag it and add to our database of what items are accurate and what items are inaccurate.

And let me know here if you have any errors. I’m still working on refining parts of the project, but anything that breaks the project is going to be of the highest importance to me. Thanks!