Monday, December 17, 2012

Readability scores

As a scientist, I am writing a lot, especially manuscripts and grant proposals. Next to content and structure, one of the aspects that has the greatest impact on whether somebody else will appreciate or even be able to make sense of what we write is the complexity of the language we use. Long and convoluted sentences are off-putting and confusing, and the use of unnecessarily technical terms can come across as pretentious. In the worst case, we may write a text that most of the intended audience cannot understand, especially if we are a specialist addressing non-specialists.

Now obviously we can have a hunch about this, but if in doubt we can use various metrics that have been developed to provide quantitative tests of readability: the automated readability index, the Flesch reading ease score, the Flesch-Kincaid grade level, the Coleman-Liau index, the Gunning fog index and the SMOG index.

The scores are all calculated from various combinations of the average number of letters per word, the average number of syllables per word, the number of words with three or more syllables, and the number of words per sentence. With the single exception of the Flesch reading ease score, which returns a value of 0 to 100 with higher values indicating simpler texts, they are supposed to return the number of years of formal education that is generally needed to understand the text. This means that lower values indicate greater ease of understanding: a text with scores around 12 could be expected to be intelligible to the average high school graduate, 15 to somebody who has obtained a bachelor of science or finished professional training, and values above 20 perhaps only to people with a PhD or an equivalent level of knowledge.

Obviously, this has to be taken with a grain of salt especially for short text fragments because there are clearly short words or sentences that are nonetheless difficult ("clade") and long ones that are easily understood ("thunderstorm"), but it still has several important applications. Whether one wants to write a biology textbook for school grade 8, a leaflet with medical advice for the general public or a brochure with legal information for recent immigrants, one has to take into account the average or perhaps even the minimum capabilities of the members of the intended audience.

Admittedly, if you are a scientist writing primarily for other scientists, you do not have to worry overly much about this. In fact, too simplistic a style with no sentence longer than five words can be just as annoying as an overly convoluted one, especially because the text will have no flow and it becomes hard to make connections and express complex thoughts. But even in scientific publications one can produce more or less unintelligible sentences, and it seems advisable to aim for the "less".

A handy website that automatically calculates the scores for fragments of text is Edit Central. (The site obviously exists to advertise Amazon books on how to become a better writer, but I don't see any harm in that.) So perhaps try it out some time if you are unsure about one of your paragraphs. Even if you have no specific reason to do so now, it is also good fun to play around with a few text fragments and see what scores they get.

Some time ago I was writing on a manuscript and one of my co-authors suggested that the following sentence in the conclusions was too long and convoluted:

The most strikingly distorted perception of relative diversity is found around species-poor but frequently visited Alice Springs and Uluru in the dry centre of the continent, while areas that presumably contain a much larger share of the overall diversity than currently evident in raw observations of species richness are sometimes as close to major herbaria as parts of south-eastern South Australia.

With the tunnel vision that comes from having spent too much time on my project and having become unable to take an outsider's perspective, I disagreed at first but ultimately had its readability scores calculated. In what may be my record so far, four of them suggested that one would need between 26 and 36 years of formal education to make sense of it, and the Flesch reading ease score, which, as you may remember, goes from 0 to 100, was actually negative. (Only Coleman-Liau was unimpressed and estimated the need for only 16 years of education.) We had a good laugh, but the sentence in question did not make it into the final draft of our manuscript.

In contrast, the instructions from the Edit Central website are as follows:

This is an interactive section for checking a sample of writing. It is modeled after the ancient Unix utilities style and diction. Enter or copy text into the first box below. The scores to the right give the readability of the text according to various formulas.

These short and simple sentences are rewarded with scores indicating that only six to nine years of education are needed to understand them. That is probably what one should aim for when writing a manual.

No comments:

Post a Comment