OKCupid, Religion and Readability

OK Cupid has been doing some interesting things recently with data mined from their users which, on the one hand, is fascinating and, on the other hand, is extremely creepy.

It all boils down to some pretty interesting data points (what do white/black/asian/Hispanic people like or have in common, what is the focus of men and women of different races, etc). The one thing I was a little shocked by was this graphic at the end.

I was a little surprised that the Protestant reading level was so low. And when I am surprised by data, I try to see if I can replicate it. (This strikes me as an eminently scientific thing to do and yet I am bemused when people think I’m “attacking” their data. Whatever.)

Note: I should mention at the outset that it does kind of irk me that OKCupid displays this data and then kind of assumes that it holds for all people across the board when it is obvious to anyone who devotes more than 5 seconds to thinking about it that the data really only holds for OKCupid users who (I’m going out on a limb here) are probably disproportionately young, tech savy and single.

Moving along. To determine reading levels they ran the Coleman-Liau Index on the profiles so I went and typed up two sample religious profile summaries, one Christian and one atheist. They’re only a couple sentences long which I figure is fine since OKCupid profile summaries aren’t exactly known for their complex narrative arcs.

Here are the profiles that I typed up, attempting to mimic what I thought would be a fair religious summary from a similar reading level.

Atheist: I am an atheist. I believe that there is no God and that most people only believe religion because they are taught to do so by society and possibly also their parents.

Christian: I am a Christian. I believe that Jesus died on the cross for my sins and that he was raised again on the third day. I think that the Bible teaches us the truth and that God loves us very much.

Running them through this Coleman Liau tester to see what popped up.

Christian: 5.47 grade level

Atheist: 10.15 grade level

I laughed.

Really? Those little blurbs are so radically different that the Christian one is 4.68 grades stupider based on nothing more than a readability analysis? Sounds like BS to me.

Let’s try adding some evolution in there:

Atheist: Same as before + “When it comes to the world around us, evolution is the most likely explanation for everything.”

Score: 12.23

Christian: Same as before + “When it comes to the world around us, I think there are probably gaps in evolutionary theory and that evolution can’t explain everything.”

Score: 8.90

Well, that closes the gap by 1.35 points, but we’re still looking at a 3.33 grade gap between two positions that are transparently written with an identical textual style.

I progressively tried to add more and more to the Christian profile to counter-act the low score I got from starting with the basics of Christian belief.

Finally I ended up with:

I am a Christian. I believe that Jesus died on the cross for my sins and that he was raised again on the third day. I think that the Bible is true and that God loves us very much. I think there are probably gaps in evolutionary theory and that evolution can’t explain everything. Additionally, the philosophical underpinnings for views that argue against Christianity frequently neglect to apply the same standard of ideological rigor to their own faith based assumptions. Consequently, they hold Christianity to a double standard assuming that their position is the default one and that there is no need to defend it.

Score: 12.33

This “Christian profile view” scores about  as well as an absurdly simple statement of atheism with a supporting line about evolutionary theory. Basically, the algorithm they used translates “6th grade atheism” at the same level of textual complexity as “basic Christian beliefs + a philosophy degree”. (I flatter myself somewhat, but the final two lines are clearly a college level writing style.)

Here is not what I’m saying: I don’t think there is any level of conspiracy theory behind any of this. No one designed the algorithm so that Christians would look stupid.

However!

It seems likely that a simple statement of Christian belief like the one entered above anchors the score at the low end. The more someone communicates their Christian belief in the language that has been familiar in churches for centuries, the less likely they are to score well regardless of the remaining textual analysis of their profile. This anchoring effect might get lost if the profile was a three page essay. But profiles tend toward being short, simple statements meant to clearly indicate basic beliefs, inclinations, or personality traits.

Note: I promise I’ll pull back on religious topics. I just get irritated when people pull “evidence” of religious people being inferior in some way shape or form. It usually strikes me as hackery that the creators or purveyors of whatever data set are perfectly happy to accept and so they neglect to do any sort of skeptical follow-up.

You can actually see the same thing with a lot of war-based data. Half the time, the people pointing to the data didn’t even get the data right and a good chunk of the remaining examples strip context out of the data. Bugs the hell out of me.

Religious Outliers Nonsense (or “Atheists Are Richer Than Religious People If You Take All Poor Atheists Out Of Your Sample”)

Charles Blow’s most recent New York Times op-ed is something of a boon for visualization enthusiasts. He replaces almost his entire article with a visualization. This illustrates that he recognizes power of visual communication to make and reinforce a point in a way that is self-obvious and can stick with the reader better than words.

Unfortunately, he has decided to use data that misleads his audience to such an extent that I can only conclude that he is unconcerned with the truth insofar as it undermines his desired objective.

Blow’s main point is that the US is an outlier in the world because we’re religious but also rich while “religiosity was highly correlated to poverty”.

I’ve reproduced the chart in question below. (Click to enlarge)

image

Now, keep in mind that this is not charting religion as it is listed in the CIA World Factbook, but according to the specific question: “Is religion an important part of your daily life?” That will be important in a little bit.

This chart seems to prove his point. Until you realize what isn’t on the map.

Here is a list of the countries that didn’t manage to make their way onto the map due to the fact that Gallup didn’t poll them:

China – 1.33 billion people, heavily non-religious, poor

North Korea – 22 million people, heavily non-religious, unbelievably poor

Cuba – 11 million people, presumed non-religious, poor

Taiwan – 23 million people, 93% Buddist*, rich (comparable to Japan)

Problem number one – Charles Blow has a duty to inform his audience of these omissions. The countries without data represent nearly 25% of the world population and skew heavily toward non-religious. They are too large and too important to the data set and visual reference to simply ignore. Yet Mr. Blow doesn’t seem interested in mentioning them.

Problem number two – Mr. Blow heavily implies that there is a causal relationship between religiosity and wealth. But (as we all know) correlation doesn’t imply causation. Western European countries (and countries filled with people from Western Europe) are richer, as are developed Asian countries. Eastern European and South American countries are less rich. Middle eastern, and African countries tend to be much poorer. There’s a correlation in geo-political histories here that is stronger than religion.

Of course Mr. Blow could always go to rural India and inform them that their poverty is related to their devotion to Hindu and has nothing to do with British imperialism. Or perhaps to the deep south where he can proclaim to the +90% Christian black population that their economic woes are related to their religious tendencies.

Problem number 3 – But the final problem is the worst one because it involves an outright lie:

Singapore is more religious and richer than the United States. And Mr. Blow didn’t map it. At all.

It’s possible that Mr. Blow is actually so numerically illiterate that he didn’t know he was supposed to tell people about key missing data points. But taking out data that doesn’t align with his point is disgusting manipulation. The end result of his deception (conscious or otherwise) is “If you take out all the poor atheists and take out all the rich religious people, then this pattern emerges…”

Mr. Blow should put Singapore back in to the data set and add a correction to his article that announces how his data set has enormous gaping holes. And he should probably never be allowed to touch charting software again.

* The CIA Factbook has Taiwan listed at 93% Buddhist, but I’m not sure how they would answer the specific question that Gallup asked. I’ve heard some atheists claim Buddhism as an “atheistic religion” (no personal god) so it could be that the citizens of Taiwan wouldn’t say that religion plays a big role. I simply don’t know.

Religious Outliers Nonsense (or "Atheists Are Richer Than Religious People If You Take All Poor Atheists Out Of Your Sample")

Charles Blow’s most recent New York Times op-ed is something of a boon for visualization enthusiasts. He replaces almost his entire article with a visualization. This illustrates that he recognizes power of visual communication to make and reinforce a point in a way that is self-obvious and can stick with the reader better than words.

Unfortunately, he has decided to use data that misleads his audience to such an extent that I can only conclude that he is unconcerned with the truth insofar as it undermines his desired objective.

Blow’s main point is that the US is an outlier in the world because we’re religious but also rich while “religiosity was highly correlated to poverty”.

I’ve reproduced the chart in question below. (Click to enlarge)

image

Now, keep in mind that this is not charting religion as it is listed in the CIA World Factbook, but according to the specific question: “Is religion an important part of your daily life?” That will be important in a little bit.

This chart seems to prove his point. Until you realize what isn’t on the map.

Here is a list of the countries that didn’t manage to make their way onto the map due to the fact that Gallup didn’t poll them:

China – 1.33 billion people, heavily non-religious, poor

North Korea – 22 million people, heavily non-religious, unbelievably poor

Cuba – 11 million people, presumed non-religious, poor

Taiwan – 23 million people, 93% Buddist*, rich (comparable to Japan)

Problem number one – Charles Blow has a duty to inform his audience of these omissions. The countries without data represent nearly 25% of the world population and skew heavily toward non-religious. They are too large and too important to the data set and visual reference to simply ignore. Yet Mr. Blow doesn’t seem interested in mentioning them.

Problem number two – Mr. Blow heavily implies that there is a causal relationship between religiosity and wealth. But (as we all know) correlation doesn’t imply causation. Western European countries (and countries filled with people from Western Europe) are richer, as are developed Asian countries. Eastern European and South American countries are less rich. Middle eastern, and African countries tend to be much poorer. There’s a correlation in geo-political histories here that is stronger than religion.

Of course Mr. Blow could always go to rural India and inform them that their poverty is related to their devotion to Hindu and has nothing to do with British imperialism. Or perhaps to the deep south where he can proclaim to the +90% Christian black population that their economic woes are related to their religious tendencies.

Problem number 3 – But the final problem is the worst one because it involves an outright lie:

Singapore is more religious and richer than the United States. And Mr. Blow didn’t map it. At all.

It’s possible that Mr. Blow is actually so numerically illiterate that he didn’t know he was supposed to tell people about key missing data points. But taking out data that doesn’t align with his point is disgusting manipulation. The end result of his deception (conscious or otherwise) is “If you take out all the poor atheists and take out all the rich religious people, then this pattern emerges…”

Mr. Blow should put Singapore back in to the data set and add a correction to his article that announces how his data set has enormous gaping holes. And he should probably never be allowed to touch charting software again.

* The CIA Factbook has Taiwan listed at 93% Buddhist, but I’m not sure how they would answer the specific question that Gallup asked. I’ve heard some atheists claim Buddhism as an “atheistic religion” (no personal god) so it could be that the citizens of Taiwan wouldn’t say that religion plays a big role. I simply don’t know.

Current Recovery “Faster & Stronger” According to CNN Money’s Chris Isidor

A recent CNN Money piece was titled “7.9 Million Jobs Lost Forever” by senior writer Chris Isidore. First, let’s set aside the idea that anything is really “forever” or the idea the we have a totally inelastic work force that, being filled entirely with idiots, never adjusts to the economic realities that confront us. Let us unstead focus on his statement:

“Excluding temporary Census workers, the economy has added fewer than 100,000 jobs a month this year — a much faster and stronger jobs recovery than occurred following the last two recessions in 2001 and 1991.”

Let’s go ahead and check that statement. Let’s take the job number at the end of the recession (as defined by this Wikipedia piece on recessions) and see how each recession did with jobs from the end of the recession onward, which we’ll define as “the jobs recovery”. The below chart shows the number of jobs in the months following the end of each recession.

image

As you can see, following the end of the recession, this “recovery” is vastly worse on the jobs recovery than previous recoveries. Now, it could be that what Mr. Chris Isidore meant to say is that “in the last 6 months, we’ve seen jobs growth much faster than the last two recessions”. That is the same as saying “See the growth between the two red arrows?”

image

“It’s awesome. I would like you to infer that, based on this very limited view of jobs growth, President Obama is doing awesome on the jobs front.”

First of all, that kind of statement is why people think that reporters (even reporters who work for CNN Money) can’t do simple addition. Either Mr. Isidore is an idiot or he is deliberately mis-reporting the data.

Speaking of which, why would Mr. Isidore only use the last two recessions as an example? Let’s look at the last three recessions instead of the last two.

image

As you can see, in a similar time frame, the improvement in jobs in the early 1980’s recession was heads and shoulders above any of the other recessions. In fact, in a comparable period, the early 1980’s jobs recovery was at a rate 4 times faster than this current one. And yet Mr. Isidore cherry picks a specific section of time within a specific set of recessions in order to make the claim that the jobs growth we’ve seen is “faster and stronger.”

Current Recovery “Faster & Stronger” According to CNN Money’s Chris Isidor

A recent CNN Money piece was titled “7.9 Million Jobs Lost Forever” by senior writer Chris Isidore. First, let’s set aside the idea that anything is really “forever” or the idea the we have a totally inelastic work force that, being filled entirely with idiots, never adjusts to the economic realities that confront us. Let us unstead focus on his statement:

“Excluding temporary Census workers, the economy has added fewer than 100,000 jobs a month this year — a much faster and stronger jobs recovery than occurred following the last two recessions in 2001 and 1991.”

Let’s go ahead and check that statement. Let’s take the job number at the end of the recession (as defined by this Wikipedia piece on recessions) and see how each recession did with jobs from the end of the recession onward, which we’ll define as “the jobs recovery”. The below chart shows the number of jobs in the months following the end of each recession.

image

As you can see, following the end of the recession, this “recovery” is vastly worse on the jobs recovery than previous recoveries. Now, it could be that what Mr. Chris Isidore meant to say is that “in the last 6 months, we’ve seen jobs growth much faster than the last two recessions”. That is the same as saying “See the growth between the two red arrows?”

image

“It’s awesome. I would like you to infer that, based on this very limited view of jobs growth, President Obama is doing awesome on the jobs front.”

First of all, that kind of statement is why people think that reporters (even reporters who work for CNN Money) can’t do simple addition. Either Mr. Isidore is an idiot or he is deliberately mis-reporting the data.

Speaking of which, why would Mr. Isidore only use the last two recessions as an example? Let’s look at the last three recessions instead of the last two.

image

As you can see, in a similar time frame, the improvement in jobs in the early 1980’s recession was heads and shoulders above any of the other recessions. In fact, in a comparable period, the early 1980’s jobs recovery was at a rate 4 times faster than this current one. And yet Mr. Isidore cherry picks a specific section of time within a specific set of recessions in order to make the claim that the jobs growth we’ve seen is “faster and stronger.”

Long Slow Recovery vs. Double Dip Recession

Interesting NYT Economix piece from Casey Mulligan on if we’re seeing long slow recovery out of the 08-09 recession or the beginning of a double dip recession. One thing he does is predict that national employment and work hours will be “a couple of percentage points higher at the end of 2010 than they are now.”

I think he’s kind of crazy, but I wanted to put it on the record so that we can look back at it at the end of the year.

First of all, the guidelines: Mulligan says that national employment will be “a couple of percentage points” higher. I’m going to assume that this means employment-population ratio as defined by the BLS which, as of this writing, is 58.7%. It is up .5% from its nadir at 58.2% last December. Let’s say that “a couple percentage points” means an increase of 1.0% in the employment-population ratio. This gives him quite the benefit of the doubt, I think.

Based on my very rough calculations, that would mean we’re looking at an increase of about 3.06 million jobs in the next couple of months, averaging 437,000 job increase per month. Keep in mind, this still puts us way below the height of the employment-population ratio of 64.7% in 2000 and nearly a full percentage below the employment ratio in place a year and a half ago.

No. Freaking. Way. Not happening. Honestly, I would be surprised if we saw another .5% increase (which is what we’ve seen in the last 6 months). I’ll stake my claim there, even though I have a feeling that we might not even make that number… I see this recovery being even flatter than that.

In any case, here’s a view of the predictions. We’ll revisit in a half a year. Or possibly in October or November when if it looks like one of us is on track.

Employment-Population Ratio 2006 – Present

image

Dean Baker Completely Unaware How BLS Calculates Unemployment Rate

Holy crap, how did Dean Baker from the Center for Economic and Policy Research get a job writing about economics?

Check out this line of his report on the recent unemployment numbers:

The unemployment rate fell to 9.7 percent in May, primarily as a result of 411,000 temporary Census jobs.

Mr. Baker is apparently unaware of how we calculate the un2employment rate. Let’s help him out here.

First we take the number of people who have jobs. This is not the “non-farm payrolls” number, which increased by 431,000. It is the “Employed, 16 years and over” number which decreased 35,000 (from 139.455 million to 139.420 million). Mr. Baker seems to have those two numbers confused, so I thought I’d clarify.

Then we take the number of people who are looking for jobs but can’t find them. This is where we get the “unemployed” number, which decreased from 15.260 million to 14.973 million.

Then we add the employed number to the unemployed number and you get the Labor Force number. In order to calculate the unemployment rate, we divide the number of people unemployed from the labor force.

139,420,000 + 14,973,000 = 154,393,000 people in the labor force

14,973,000 / 154,393,000 = 9.7% Unemployment rate

Let’s try to prove Mr. Baker’s statement that the unemployment rate dropped to 9.7% “primarily” due to the 411,000 census jobs. We’ll subtract 411,000 from the “employed” number.

139,009,000 + 14,973,000 = 153,982,000 people in the labor force

14,973,000 / 153,982,000 = 9.7% Unemployment rate

We get the exact same unemployment rate with or without the census jobs. That is because unemployment rate dropped due to people leaving the labor force. And most of the people who left the labor force came from the “unemployed” category. Otherwise known as “discouraged workers”.

This is part of the reason that economic understanding is so dismal among the general public. An economic reporter should be able to get the simple facts right about a job report.

Oil Spill Simulation Shows Super Crappy Independence Day

UPDATE: Check out Bill’s comments below. It seems that this visualization may be taking us for a ride.

Fascinating computer simulation shows the oil slick wrapping around Florida and basically taking a crap all over the eastern seaboard starting about July 4.

I don’t really care about blame on this issue. That being said, I pretty much blame BP.

More seriously, though, it seems to me (as a totally ignorant observer) that we’re quickly coming to a point where containment of what has already leaked out is just as important as stopping the leak. Is it totally impractical to assume that the US naval reserves might be able to take charge of the slick containment work? Is there any plan to do that?

I don’t know, I’m just asking. If you have anything resembling the answer, I’d love to hear it.

What Happens to Unemployment Tomorrow?

Just thought I’d post this. I’ve always been a little fascinated by the number of people needed to pull off something as huge as a the census. And next month we should see the peak of the census employment burst. Observe:

This was done in about 20 minutes, so it might need some explanation.

Basically, I start with with the June before the census and mark that number (somewhat arbitrarily) as my base federal employment point. Then I checked the employment numbers moving forward from that point as a percentage of that number.

As you can see, if this census year follows the path of the last census year (which it seems to be doing so far) the May employment number coming out tomorrow should add around 300-350 thousand jobs due to the census alone.

How big is a 300-350 thousand job increase? Well, the increase in employment as a whole between March and April was about 250 thousand. So, if the recovery continues as it has been, we should see an increase of something along the lines of 500-600 thousand jobs tomorrow.

Take note, I’m doing really simple guesswork here. I’m pretty sure that geoff over at Innocent Bystanders will have more intelligent things to say on the matter tomorrow.