Readability measures for law firm surveys

Let’s consider a few more readability measures.

  1. The Bormuth Readability Index (BRI) calculates a reading grade level required to read a text based on 1) character count (average length of characters) rather than syllable count and 2) average number of familiar words in the sample text. The BRI uses the Dale-Chall word list to count familiar words in samples of text. The BRI translates to a U.S. grade level. For example, a result of 10.6 means students in 10th grade and above can read and comprehend the text.
  2. The Danielson and Bryan formula is most concerned with the variables of the letters themselves. It uses the number of characters per space and how many characters are in a sentence. From the \textsf{R} package koRpus: DB_1 = ≤ft(1.0364 \times \frac{C}{Bl} \right) + ≤ft( 0.0194 \times \frac{C}{St} \right) – 0.6059  DB_2 = 131.059 – ≤ft( 10.364 \times \frac{C}{Bl} \right) – ≤ft( 0.194 \times \frac{C}{St} \right)  Where Bl means blanks between words, which is not really counted in this implementation, but estimated by words – 1. C is interpreted as literally all characters.
  3.  The Degrees of Reading Power (DRP) test purportedly measures reading ability in terms of the “hardest text that can be read with comprehension”. Grades 6-8 can read and comprehend text with a DRP of 57-67, Grades 9-10 can handle DRPs of 62-72, Grades 11-12 can handle 67-74; and college graduates and above can handle above 70 DRP. Uses the Bormuth Mean Cloze Score (MC): DRP = (1 – B_{MC}) \times 100. This formula itself has no parameters.
  4. Fang’s Easy Listening Formula (ELF) focuses on the proportion of polysyllabic words in a text. ELF is calculated by counting the number of syllables in a sentence and the number of words. ELF = S – W, where S and W are the number of syllables and words in a sentence respectively. This formula punishes every extra syllable.

Readability measures and surveys by law firms

Many other readability measurements have been devised. The plot below shows the Automatic Readability Index (ARI), Coleman-Liau Index, and the Simple Measure of Gobbledygook (SMOG) applied to six reports by U.S. law firms. First, we will briefly explain the three measures.

The Automated Readability Index (ARI) assesses the grade level needed to comprehend the text. For example, if the ARI outputs the number 10, this equates to an assessment that a high school student in the tenth grade of schooling, ages 15-16 years, should be able to comprehend the text. The formula to calculate the Automated Readability Index is 4.71(characters/words) + 0.5(words/sentences) – 21.43.

The Coleman-Liau Index looks at the average number of letters per 100 words (L), and the average number of sentences per 100 words (S).  The formula to calculate the Coleman-Liau Index is 0.0588L – 0.296S – 15.8.  This translates to a grade, so that, for example, a 10.6 means roughly appropriate for a 10-11th grade high school student.

The Simple Measure of Gobbledygook (SMOG) is based on word length and sentence length being multiplied rather than added, as in other readability formulas. The SMOG formula correlates 0.88 with comprehension as measured by reading tests. The SMOG formula is SMOG grading = 3 + the square root of the polysyllable count, where polysyllable count = number of words of more than two syllables in a sample of 30 sentences. This next table translates the higher levels of SMOG to an approximate grade level.

The Foley Lardner report has a much higher total score than the other five reports as its estimated grade level is the highest on all three measures. The Dykema Gossett report, by contrast, aims for a less sophisticated audience.


Readability of survey reports with the Flesch-Kincaid assessment

We previously looked at the Flesch reading-ease test. A cousin assessment, the “Flesch-Kincaid Grade Level Formula” (Flesch-Kincaid), also calculates a readability score and also expresses it as a U.S. school grade level. It can be thought of as the number of years of education generally required to understand the text. The sentence, “The Australian platypus is seemingly a hybrid of a mammal and reptilian creature” yields an 11.3 grade level as it has 24 syllables and 13 words.

The grade level is calculated with the following formula:

{\displaystyle 0.39\left({\frac {\mbox{total words}}{\mbox{total sentences}}}\right)+11.8\left({\frac {\mbox{total syllables}}{\mbox{total words}}}\right)-15.59}

According to Wikipedia, the different weighting factors for words per sentence and syllables per word in the Flesch reading-ease test and the Flesch-Kincaid Grade Level Formula mean that the two assessment tools are not directly comparable and cannot be converted. The grade level formula of Flesch-Kincaid emphasizes sentence length over word length. Due to the formula’s construction, the score does not have an upper bound.

To the the three British reports considered previously we added two US reports: Eversheds 21stCentury 2008 and Morrison Foerster GCsup 2016. The topic of the two additions is also the legal industry. Basic statistics about the five reports can be seen in the next table.

The Flesch-Kincaid grade level stands at the college sophomore level for the reports of CMS and KL Gates, even higher for Morrison Foerster, at the postgraduate level, and in some PhD program for Allen.

Readability of reports (Flesch reading-ease test)

All survey reports share a characteristic: how understandable is their prose. Many measures exist for assessing readability, including the long-time Flesch reading-ease test (FRET). With FRET, higher scores would indicate a survey report that is easier to read; lower scores would indicate reports that are more difficult to read. The formula for the Flesch reading-ease score test is

According to Wikipedia, “Reader’s Digest magazine has a readability index of about 65, Time magazine scores about 52, … and the Harvard Law Review has a general readability score in the low 30s.  FRET scores correspond to the reader’s school level shown in the first table below.

To investigate FRET scores, we selected three survey reports that each share two important characteristics. First, the topic of the report — the legal industry itself — and second, that three different British law firms produced the reports. The three reports are CMS GCs 2017, Allen Overy Innovative 2012, and Eversheds 21stCentury 2008. The objective was to analyze text about a similar topic written by firms of a similar national and linguistic background.

Because the cover has sparse text and the final page often has mostly contact information, we removed those two pages and then extracted all the text. The table below provides basic information about each report’s text.Thus, from the third table, all of the reports are written at the level of a well-educated reader. The graph that follows shows another aspect of these three reports: the number of words per report page (less the cover and back page). The results might be called a measure of cognitive density: how much text information is crammed into each page. Note that the length of the report does not correspond to how many words are on each page on average.