Sometimes, data are about places and should be placed on a map. A very common type of data is on areas such as municipalities or countries, and we will only cover how to visualize such data here. To learn more, you should take a course on geographical information systems, such as SGO1910.
A map will typically be in a spcial data format and you can plot it using ggplot in a similar way as in previous exersices. We need some special functions to handle map data, and we will use the sf package. ‘sf’ stands for simple features, but there is nothing simple about it, of course.
Here is a map of district of the city of Oslo.
ggplot(oslo_trangbodd) +
geom_sf(fill = NA) +
theme_void()
A map such as this is basically a lot of points defined by x and y coordinates, and a line drawn from point to point in a closed circuit. Such drawn areas are sometimes called polygons. Thus, such maps can be stored as a data frame with two variables for coordinates. But there has to be some additional structure. Let’s look at the data file using glimpse()
:
glimpse(oslo_trangbodd)
## Rows: 16
## Columns: 4
## $ bydel <chr> "030112", "030109", "030105", "030101", "030102", "030110",…
## $ bydelsnavn <chr> "ALNA", "BJERKE", "FROGNER", "GAMLE OSLO", "GRÜNERLØKKA", "…
## $ trangbodd <dbl> 28.7, 26.3, 15.5, 23.3, 23.0, 26.8, 16.8, 17.2, 10.4, 22.6,…
## $ geometry <MULTIPOLYGON> MULTIPOLYGON (((268660.9 66..., MULTIPOLYGON (((26…
This looks a bit like an ordinary data file, and the variables in this dataset is simply the city district (“bydel”), the district’s name, and the proportion of residents who live cramped. (That is: small houses and appartments relative to size of the family).
But then there is a variable “geometry” which looks rather cryptic. Without going into details, this column hold all the values associated with each polygon. While an ordinary dataset would have one value in each cell in the data matrix, this particular column holds many. (An sf-object is thus different from a data.frame-object, although it looks a bit similar).
It is fairly straight forward to make a map using ggplot. All you have to do is to specify a different geom_* . The geom_sf() will look up the geometry-column in the data frame and plot the polygons. Thus, you do not need to specify any aes()
at this stage.
ggplot(oslo_trangbodd) +
geom_sf()
The dataset included a variable indicating the proportion of the population in each aera who live in a small appartement. We can show this on a map, specifying that the fill color should use that variable. In addition, we simplify the graph by adding theme_void()
, as explained in an earlier chapter.
ggplot(oslo_trangbodd) +
geom_sf(aes(fill = trangbodd), color = "white") +
theme_void()
Importantly: when the fill color is specified by a variable it is an aesthetic, specified inside aes()
. In the example below, note that fill =
is specified by a variable as an aesthetic, while the line colors are specified as a single color outside aes()
. Both fill =
and color =
can be inside aes()
or not, depending on what you want to do. It is also worth noting that aesthetics are typically specified in the first line, but can also be specified inside any geom*()
as done here.
The default color choice for a continuous variable is a gradient of blue colors, and you get a legend explaining the scale.
Color scales can also be specified based on gradients across two colors. For instance, from red to yellow. To do so, you add another line for scale_fill_gradient()
.
These colors might look very alarming and a bit too dramatic? Try something calmer, such as setting low to “white” and high to “black”. Try also combinations of blue, green, red, yellow, purple etc. Some scales will look good - others plain silly…
ggplot(oslo_trangbodd) +
geom_sf(aes(fill = trangbodd)) +
scale_fill_gradient(low = "...", high = "...") +
theme_void()
You should make a mental note that there are several functions for colors using scale_* . This one is for fill colours, but there is also one for line colors, and each of these can be discrete or continuous based on variables in the data, or being listed manually etc. We will not go into all of that there. Using colors is a science (or art) in itself, and you can get an introduction in Wickham’s book on ggplot.
The typical situation is where you got a dataset that only contain the values you want to put on a map - but not the map itself. For example, we got a dataset on characteristics in each of the districts of Oslo.
Take a look at these datasets using glimpse()
:
glimpse(... )
The variable for the district (bydel) has a numeric code, and then there are variables with characteristics of that district.
What you lack is the map itself. You need to get it from somewhere else. A range of maps are freely availabe at Geonorge. What you need is a shapefile which R can read into an sf-object. We do not cover that here, but you can read about it on the sf developer page. Handling map data is a big topic in itself, so we proceed from having the map in the right format.
To make a map, you need the map itself, but you might have the variables you would like to show in a separate dataset. The map is stored in an object oslo and data on median incomes are in inntekt.
These two datafiles needs to be merged together. You might have noticed that both have a column named “bydel” with a code for district. These are the official administrative codes for districts, so the datasets can be linked in this variable.
Linking relational data is explained in more depth in R4DS. Here, we use the function left_join()
to link the two datasets on the common column “bydel”.
oslo_inntekt <- left_join(oslo, inntekt, by = "bydel")
The code creates a new object with the combined data. left_join() keeps all the data in the first object ()
Now that you got it all in one file, it is your turn to create a map that show how the median income is distributed in Oslo. Make the colors go from white (low) to black (high), and use a theme that removes the coordinate system and the background. Store the graph in an object p_income.
p_income <- ggplot(...) +
geom...( aes(fill = ...)) +
scale_fill_gradient(... = "white", ... = ... ) +
theme...()
p_income
"Did you specify geom_sf() and the right variable to use for color?"
'scale_fill_gradient you should set low = "white" and high = "black". '
'Any theme will give you something, but you need to use theme_void() to get this right.'
ggplot(oslo_inntekt) +
geom_sf(aes(fill = alle)) +
scale_fill_gradient(low = "white", high = "black") +
theme_void()