# Visualize variables in surveys with Sankey diagrams

Let’s say we would like to understand and visualize how survey reports vary in frequency by country, page orientation, and involvement with co-coordinators. A Sankey diagram (aka river plot) can reveal such insights on a plot as sized flows of numbers.

Consider a data set of 174 research-survey reports. For each report we know the headquarters country of the law firm or that it is a “VereinCLG” (firms that are either Swiss vereins or a “company limited by guarantee” (CLG)). Thus, for 9 surveys by Canadian law firms, 48 by UK law firms, 109 by U.S. firms, and 48 by VereinCLGs we also know whether the report was portrait or landscape orientation and whether the firm teamed with a co-contributor or surveyed on its own.

Starting at the left of the Sankey diagram below, the height of the four rectangles tell the relative proportions of surveys by country. Each rectangle then divides into two streams: the top stream flows into the Portrait orientation rectangle and the bottom stream flows into the Landscape rectangle. In the middle of the plot, the green rectangles indicate by their relative heights the proportions of portrait and landscape reports. Two streams flow from each of the orientation rectangles, the top one indicating the proportion of reports that did not have a co-coordinator (FALSE) and the lower stream the proportion that had a co-coordinator (TRUE). Again, the relative heights of the right-most rectangles suggests the proportions.

Consider the reports published by UK law firms. They are mostly portrait, because that stream is much thicker than the narrow stream pouring down into the “Landscape” rectangle at the bottom. But the Portrait and Landscape rectangles combine the data of all the countries, so I don’t think it is possible from this Sankey diagram to say what proportion of UK reports involved a co-coordinator. That said, of the portrait reports, fewer had co-coordinators but the balance was roughly even.

However, by swapping two words in the code that produced the first Sankey diagram, we produced the variation below that shows what proportion of a country’s reports involved a co-coordinator. It appears that the UK reports are approximately evenly divided between co-coordinators and no co-coordinator.

