## Thursday, April 17, 2008

### Lowering the bar

I was helping my daughter with some homework the other night. She had been asked to use a spreadsheet program to produce a bar chart. I believe the numbers were densities (g/cm3) and they were something like:
92.5, 91, 93.5, 92
And here's what Excel produced:
The vertical axis starts at 89.5, so the height of each bar represents the density−89.5, which means ... ??

Junk Charts quotes Naomi Robbins, author of Creating More Effective Graphs thus: "all bar charts must include zero". Indeed—otherwise what do the bar heights represent? That Excel's defaults violate this rule is, ahem, unfortunate. (I've tried this using Excel 2000 and Excel on a Mac, but perhaps it's been fixed in newer versions? Maybe?)

Excel can be coerced into starting its vertical axis at 0, but it takes a fair bit of clicking and navigating. The result is:
Relative to a density of zero, there's very little variation. But perhaps this hides the message in these numbers. Doesn't that just bring us back to the first bar chart? Well ... no.

This graph shows the data, with the vertical axis zoomed in to where the action is. Unlike the original bar chart, it doesn't show bars with arbitrary heights.

Again from Junk Charts:
The "start-at-0" rule says that the vertical axis of any graph ought to start at value 0. The rule was mentioned in Huff's classic booklet, "How to Lie with Statistics": as the name implies, the rule is intended to eradicate mischievous graphs that exaggerate small differences by not starting at 0, which is to say, by choosing a misleading scale.

Others, like Tufte and Wainer, have long realized that the start-at-0 rule is not absolute ... My own "anti-rule" stipulates that if all data appearing in a chart are far from 0, then don't start at 0.

If, on the other hand, some of the plotted data are close to 0, then it is essential to start at 0.
This isn't too far from my view, but it doesn't address bar charts, which are a special case because they emphasize the heights of the bars, rather than the position of the tops of bars. Bar charts are only appropriate for variables that are measured on ratio scales. For such variables, there is a non-arbitrary zero, which means that you can calculate a meaningful ratio; e.g. for weight: one thing might weigh twice as much as another. But some variables aren't like that; e.g. IQ: an IQ of zero is meaningless, and so it doesn't make sense to say that someone with an IQ of 100 is twice as intelligent as someone with an IQ of 50. For variables of this kind bar charts make no sense at all.

So, if your variable isn't ratio scaled (in other words, there isn't a meaningful zero), don't use a bar chart. If it is ratio scaled and you decide to use a bar chart, make sure your axis starts at zero.

Derek puts it well in a comment at Pictures of Numbers:
There is a circumstance in which the would-be grapher absolutely must start with zero, and that's when creating a bar graph. If that causes problems, it's time to consider abandoning the bar graph and adopting something which doesn't need a zero on the scale. I've seen bar graphs where the designer recognised the problem with zero, adopted and defended the solutions, but without getting rid of the bar graph format. Those wavy gaps are the least bad of the abortive compromises resorted to by people who won't give up their bars.
In case anyone thinks this really isn't much of an issue, here are some examples I found quite easily:

## Saturday, April 12, 2008

### Food for thought

The global price of food has risen sharply over the last 18 months. This is most acutely the case with cereals. The New York Times reports that wheat has reached its highest price in 28 years. The reasons for this phenomenon seem to be broadly accepted; see for example, Paul Krugman's column or a recent presentation (pdf) by Joachim von Braun of the International Food Policy Research Institute.

Though the relative importance of the reasons is difficult to assess, the list itself seems clear (the price of oil, a growing middle class in China and India with an increasing demand for meat which requires more grain for feed, droughts likely due to climate change, Western government subsidies for biofuels like corn ethanol).

But I wonder if we shouldn't consider a different aspect of this. As the New York Times points out:
Even the poorest fifth of households in the United States spend only 16 percent of their budget on food. In many other countries, it is less of a given. Nigerian families spend 73 percent of their budgets to eat, Vietnamese 65 percent, Indonesians half.
What is wrong with our world that so many people are living so close to the edge? Hmmm ...

Update 14Apr2008: The graph below was produced using Technorati. It shows the number of blog posts (in "any language" on blogs with "some authority") containing "food crisis". Too bad most of us are at least 6 months late.

## Friday, April 11, 2008

The Internet makes it possible to link a dispersed community of common interest. Now there are a number of blogs that focus entirely or in part on Statistics, but they seem not to be well connected.

So I've just set up a social bookmarking website just for applied statistics, data analysis, and visualization. It's called StatLinks.

It lists links that users submit, and allows other users to vote on their relevance. Links are listed in order of popularity (or in chronological order, if you prefer).

I encourage people to visit StatLinks, to submit links that are likely to be of interest, and to pass the word! I've put a few links in to get things started. (Hat tip to Slinkset whose technology made it a breeze to set this up.)

## Thursday, April 10, 2008

### Could you keep my place in line?

Line-ups are both eminently civilized and—really annoying! The first in first out (FIFO) principle is inherently egalitarian and respect for it is a sign of social order. But there's something crazy about using our bodies as place keepers in a queue, sometimes for hours on end.

Inevitably, after waiting some time in a lineup, someone will need to step out for a while. Rather than lose one's priority in the sequence, the convention is to ask someone (a complete stranger if need be), "Could you keep my place in line?"

The language here is metaphorical and indirect. The request is not really about keeping a place. It's about promising on the return of the person to vouch to any potential challengers that indeed this particular person was previously in line at this particular point in the sequence.

The fact is, complete strangers generally do agree to "keep your place in line". And that's a further sign of civil behaviour. Maybe line ups aren't so bad after all!

I bet there are lots of good stories about line-ups. I'd love to hear some. Then we could publish a book (I'm trying to think of a queued name for it ...)

P.S. I've tried to give equal time to the different spellings lineup / line-up / line up. I really don't know which is correct. Those who wish to correct me should form an orderly line.

Labels: ,

## Tuesday, April 08, 2008

### Nature vs. not sure

The perennial nature-vs-nurture debate just won't go away. This is particularly true with regards to gender differences, a subject of broad interest.

I'll acknowledge my biases up front. I have long been skeptical about biological determinism. This is partly because of its historical association with racism, sexism, classism, and the eugenics movement. But it's also because, particularly in recent years, there has been a tendency to overstate the importance of genetics in explaining human behaviour. Part of the explanation for this "genohype" may be the dramatic achievements of the Human Genome Project together with the rise of the biotechnology sector. Just as the success of Darwin's theory of natural selection led to Social Darwinism, today's molecular genetics revolution has put a new wind in the sails of biological determinism.

In the scientific world, the nature-vs-nurture debate is generally accepted to be an ill-posed problem. Because the environment affects the expression of genes, it is not a question of nature versus nurture, but of nature vis-à-vis nurture. Nevertheless, the ways in which and the extent to which nature and nurture influence human behaviour remain controversial. And beliefs about this can have profound consequences.

But one thing's for certain, and that's uncertainty. Despite the way results from studies of gender differences are often portrayed, we're usually left with more questions than answers. Here I want to comment briefly on two considerations that should be borne in mind.

Does the difference matter?

It's common to read reports stating that, for example, "women perform task X better than men". What this really means is "on average women perform task X better than men, and this effect was found to be statistically significant". The magnitude of the effect may be small or large. The degree of overlap between women and men may be small or large. (And of course the study may be flawed.)

To what can the difference be attributed?

Assuming the difference is real and meaningful, we're still left with the question of whether it represents an innate biological difference or an environmental (cultural) difference. For some reason it seems that people quickly jump to the conclusion that gender differences are innate. But in most cases it is extremely difficult to sort this out. Cultural effects can be extremely subtle. As has been pointed out (by ?), the concept of "wet" wouldn't mean much to a fish.

Grist for the mill

Here are three interesting articles that touch on some of these issues. First, a review by Viv Groskop of "The Sexual Paradox: Troubled Boys, Gifted Girls and the Real Difference Between the Sexes" by Susan Pinker. Next, an interview with professor of language and communication Deborah Cameron about her book "The Myth Of Mars And Venus". Finally, a New York Times article by Elizabeth Weil about the movement for single-sex public education based on gender differences.

I've really only scratched the surface of this issue (not to mention related ones), and there's lots of stuff out there (a Google search of "gender differences" gives 2,450,000 results). Comments?

Update 09Apr2008: It seems there's an almost unlimited number of links that could be added. Here's another review of Susan Pinker's book, from the New York Times. Here's an entertaining retort to an argument about gender differences based on evolutionary psychology. And here's a piece that argues: "Nowhere do scientific findings get more mangled than when they’re about the differences between men and women." Finally, here's a conservative view on gender differences.

Update 11Apr2008: Here's a response to some of the arguments about single-sex schooling.