The Budapest Office - Castro Bisztro, Madach ter

Thursday, 2 August 2007

Why You Don't Want to be a Geek (The Flesch-Kincaid Formula is Wrong)

Well, I'm not sure if "Geek" is the right word - perhaps I mean "Nerd"? - but you can see the problem instantly: obsessing about the trivial. Does it really matter which word I use when, whichever I do choose, you know full well (or will do by the time you've read this, or other parts of the blog) that this guy...

a) Is Eccentric (I'm English; "eccentric" is not only acceptable, it is possibly even mildly flattering...)
b) Is Quirky (thanks for that one Gary)
c) Isn't Two Bricks Short of a Load - in fact probably has two loads of bricks, but doesn't know what to build with them
d) Is Weird

i.e. is, not to beat about the bush, more than slightly geeky, possessed of a high coefficient of nerdulence (let's make that my CoN for future reference)

The nicest reflection I've had on my idiosyncratic disposition is that "Julian doesn't just answer questions that no one else can, he answers questions that no one else even thought of asking."

Sadly the corollary to that is probably "Actually, he answers questions no one else thought worth asking," (some examples later) but...

Here I am, brain the size of a (dwarf) planet and it's full of Stuff! And useless Stuff at that.

Life, don't talk to be about life... (Actually, do talk to me about life - isn't it fascinating?)

However, about those sheep...

What prompts this (hitherto promised but thus far undelivered) piece of naval gazing?

The Flesch-Kincaid readability statistics, Reading Ease and Grade Level as implemented in Microsoft Word (hopefully as per DOD Standard MIL-M-38784B. Detail! The Geek or Nerd must attend to detail... which means, having finally tracked it down, that MIL-M-38784 of July 1995, superseding MIL-M-3978C of October 1990... still with me? ... doesn't in fact contain the Flesch-Kincaid formulas... at least any more. Maybe it did. Who knows?)

And what is the precise problem?

I had Word do the readability statistics for IT - Reading Ease 83.2, Grade Level 5.9. So far so good (in fact rather too good - I don't believe an 11 year old could read IT). But then I noticed that the formulae:
  • Grade Level = (0.39 * Average Sentence Length) + (11.8 * Average Syllables per Word) - 15.59
  • Reading Ease = 206.835 - ((1.015 * Average Sentence Length) + (84.6 * Average Syllables per Word)
would allow me, given the stats I had, to work backwards and calculate the average number of syllables per word (ASW), which is of course Trivial and therefore, almost by definition, of riveting interest to the Geek or Nerd.

So, I invert the formulae, plug in the values and uh oh! I get two different answers.

Check I've got the formulae right in Excel and then that I get the right answers for something (I used the Dr Seuss Green Eggs and Ham stats in the Wikipedia article on Flesh-Kincaid - advanced warning/plot spoiler: I think they are using the wrong formula at the moment) - which I do.

So, is Word miscalculating or is the problem that the ASL of 18.6, Reading Ease and Grade Level scores are all inappropriately rounded...?


Some time later...

Word says there are 18.6 words per sentence, but using its own figures for word count and sentence count the average appears to be 19.16 (2 S.F.). Aha! Word's Average Sentence Length is Wrong! But no, that doesn't fix it. Nor does undoing any rounding that may have occurred.

So what's going on? Beats me. I blame Microsoft.

Hastily, and erroneously it seems. I found somewhere else on the web a slightly different version of the Grade Level formula.. it should in fact be:
  • Grade Level = (0.39 * Average Sentence Length) + (11.8 * Average Syllables per Word) - 15.9
And then, once you've made allowances for rounding errors in Word's stats, everything seems OK at last (well, almost, to get precise agreement that constant has to become 15.96536ish - I think I'll settle for 15.9, it's close enough)

Except that the Word help files says they use the value 15.59.

But at least I can semi-reliably calculate the average number of syllables per word in IT. We'll talk about the ARI, Gunning-Fog and Coleman-Liau indices some other time...

Now wasn't that worthwhile? No. This is why you don't want to be a geek.

