Creating the right data map

280 lines of code (R)

12 Jan 2019

Information with a geographical element can best be visualised with a map. However, big regions tend to dominate maps independent of their actual importance. I show possible ways around this issue and let you generate the right data map for your own purposes without needing to code.

alt text

The problem with standard data maps (choropleth maps)

When visualising data, it’s worth considering more than just whether the data display is correct. Different kinds of visualisations invite different interpretations in the minds of the reader.

Let me explain what I mean by showing different visualisations of the same geographical data: 4G mobile internet coverage in Germany. We focus on my provider o2 which happens to offer the worst as well as the cheapest network in Germany. A standard data map would look like this:

alt text

The 16 federal states of Germany are coloured according to the 4G coverage o2 provided in May 2018 according to the website allnet-flatrate.net. Dark blue colours represent good 4G coverage of up to 97% in Berlin while light blue colours imply patchy coverage down to 41% in Brandenburg.

A standard data map like this is called a choropleth map and even though it is formally correct I often find it misleading. Why? Because, readers usually assume big things to be more important than small things. In the case of maps big regions appear more important even though they might be home to fewer people than smaller regions.

Standard choropleth maps size regions according to area while readers mistake size for importance. How can one correct this?

Solution 1: Sized data points in space (bubble maps)

First of all, one could size abstract data points according to importance. In our case this would be equivalent to using state population as the size variable instead of area above. By positioning the data points on a map, the reader quickly sees which region’s data each point represents.

alt text

The data and the colour coding are the same as before. The only difference is the switch from a choropleth map to a bubble map. Suddenly the situation looks better for o2 customers. The darker shades of blue dominate. The lighter shades of blue don’t stand out anymore.

City states like Hamburg or Berlin are sized bigger than before while geographically big states with few inhabitants are sized smaller than before.

Solution 2: Sized states (noncontinuous cartogram)

A similar solution would take the states directly and size them according to population. Gaps appear where geographically big states get sized down due to small popultions.

alt text

Again, the data and the colour cording are the same as before. Unfortunately, states with a similar geographical midpoint like Berlin and Brandenburg in the north-east of Germany now overlap. The next solution avoids this.

Solution 3: Sized states (continuous cartogram)

The final solution I will offer here looks perhaps the most impressive. States are again sized according to popultion in order to mirror their importance. However, this time, the resulting gaps are filled.

alt text

Even though this sort of cartogram looks nice, it takes away the primary objective of maps: orientation in space. Because states get stretched and squished, it takes time to understand what one is looking at.

Creating your own data maps

The way I see it, there is no one size fits all solution here. Which approach do you prefer?

I made an interactive shiny web app, so you can create the data map you deem best. You can find it here. I wrote it using R. The script is on github here.

The challenge of data visualisation goes beyond representing the data truthfully and beautifully. I see the real challenge in bringing the reader to understand the data correctly without effort. Meeting this challenge requires thinking about the data and the reader when visualising data.

Like this post? Share it with your followers or follow me on Twitter!