Let’s give some thought to ranges checked when the surveyor wants to figure out averages or medians. The double-plot image below brings up the point. It comes from Davies Ward Phillips & Vineberg, “2010 In-House Counsel Barometer” at 9 (2010).
Considering the plot on the left, a reader should assume that the survey question was as stated at the bottom of the graphic: “Question: How long have you been a lawyer?” and that four selections were available for respondents to check. They were, one assumes, “<5 years”, “5-9 years”, “10-19 years” and “20+ years”. Hence the pie chart plot has slices for each of those four bins.
If those assumptions are right, however, the firm could not have stated above the plot that “On average, in-house counsel have practiced law for 16.3 years …”. When a survey collects information in ranges, no analyst can calculate averages from that form of aggregated data. If 17% of the respondents have practiced law between five and nine years, it is not possible to calculate an average even for that single range let alone all four categories. So Davies Ward must have asked for a numeric answer on the questionnaire and created the four bins afterwards.
Why didn’t the firm share the more detailed information? After all, when analysts bin data, they jettison information. Furthermore, subjectivity enters in when someone allocates data to categories on the questionnaire or after the fact.
It would have been better to create a scatter plot and thereby show all the data points. That way, readers can draw their own conclusions about the pattern of the distribution.
Sometimes surveyors have concerns that individual points on a scatter plot could be tied to a specific respondent (like the longest-practicing lawyer or the highest paid lawyer). But analysts can sidestep such concerns with a box plot that tells more than the percentages in bins.